2019 July 08 Reinforcement Learning, Math
Bellman Equation V(s) Proof
Why we need to understand it?
Bellman equation is a key point for understanding reinforcement learning, however, I didn’t find any materials that write the proof for it. In this post, I will show you how to prove it easily.
For all s∈S:
equals to a∑π(a∣s)s′∑r∑p(s′,r∣s,a)[r+γEπ[Gt+1∣St+1=s′]](2)
That is: vπ(s)=a∈A∑π(a∣s)(Rsa+γs′∈S∑Pss′avπ(s′))
We need to prove (1) to (2)
In this example, I just improve this aligned:
For general situation, E=x∈X∑x⋅p(x)
For this question, the target is to get Eπ[Rt+1∣St=s], thus p(x) means p(r∣s), and then:
which equals to
Why? p(r,a)=p(r∣a)⋅p(a), thus p(r,a∣s)=p(r∣a,s)⋅p(a∣s). And p(r∣s)=∑ap(r,a∣s)=∑ap(a∣s)⋅p(r∣a,s), in bellman function, p(a∣s)=π(a∣s)
Thus we have (3)=a∑π(a∣s)⋅r∈R∑⋅s′∈S∑p(s′,r∣s,a)⋅r
Welcome to share or comment on this post: