2019 July 08 Reinforcement Learning, Math
Bellman Equation V(s) Proof
Why we need to understand it?
Bellman equation is a key point for understanding reinforcement learning, however, I didn’t find any materials that write the proof for it. In this post, I will show you how to prove it easily.
Simple Proof
For all s∈S:
vπ(s)≐Eπ[Gt∣St=s]
=Eπ[Rt+1+γGt+1∣St=s](1)
equals to a∑π(a∣s)s′∑r∑p(s′,r∣s,a)[r+γEπ[Gt+1∣St+1=s′]](2)
=a∑π(a∣s)s′,r∑p(s′,r∣s,a)[r+γvπ(s′)]
That is: vπ(s)=a∈A∑π(a∣s)(Rsa+γs′∈S∑Pss′avπ(s′))
We need to prove (1) to (2)
In this example, I just improve this aligned:
Eπ[Rt+1∣Sk=s]
Relationships
For general situation, E=x∈X∑x⋅p(x)
For this question, the target is to get Eπ[Rt+1∣St=s], thus p(x) means p(r∣s), and then:
Eπ=r∈R∑r⋅p(r∣s)
which equals to
a∑π(a∣s)⋅r∈R∑p(r∣s,a)⋅r(3)
Why? p(r,a)=p(r∣a)⋅p(a), thus p(r,a∣s)=p(r∣a,s)⋅p(a∣s). And p(r∣s)=∑ap(r,a∣s)=∑ap(a∣s)⋅p(r∣a,s), in bellman function, p(a∣s)=π(a∣s)
Since
p(r∣s,a)=s′∈S∑p(s′,r∣s,a)
Thus we have (3)=a∑π(a∣s)⋅r∈R∑⋅s′∈S∑p(s′,r∣s,a)⋅r
Welcome to share or comment on this post: