2019 July 08 Reinforcement Learning, Math

Bellman Equation V(s) Proof

Why we need to understand it?

Bellman equation is a key point for understanding reinforcement learning, however, I didn’t find any materials that write the proof for it. In this post, I will show you how to prove it easily.

Simple Proof

For all $s \in \mathcal{S}:$

$\begin{aligned}v_{\pi}(s)\doteq\mathbb{E}_{\pi}\left[G_{t}|S_{t}=s\right] \end{aligned}$

$\begin{aligned} \tag{1} =\mathbb{E}_{\pi}\left[R_{t+1}+\gamma G_{t+1} | S_{t}=s\right] \end{aligned}$

equals to $\begin{aligned} \sum_{a} \pi(a | s) \sum_{s^{\prime}} \sum_{r} p\left(s^{\prime}, r | s, a\right)\left[r+\gamma \mathbb{E}{\pi}\left[G_{t+1} | S_{t+1}=s^{\prime}\right]\right] \tag{2} \end{aligned}$

$\begin{aligned} =\sum_{a} \pi(a | s) \sum_{s^{\prime}, r} p\left(s^{\prime}, r | s, a\right)\left[r+\gamma v_{\pi}\left(s^{\prime}\right)\right] \end{aligned}$

That is: $v_{\pi}(s)=\sum_{a \in \mathcal{A}} \pi(a | s)\left(\mathcal{R}_{s}^{a}+\gamma \sum_{s^{\prime} \in \mathcal{S}} \mathcal{P}_{s s^{\prime}}^{a} v_{\pi}\left(s^{\prime}\right)\right)$

We need to prove $(1)$ to $(2)$

In this example, I just improve this aligned:

$\begin{aligned} E_{\pi}\left[R_{t+1} | S_{k}=s\right] \end{aligned}$

Relationships

For general situation, $\begin{aligned} E=\sum_{x \in X} x \cdot p(x)\end{aligned}$

For this question, the target is to get $\begin{aligned} \mathbb{E}_{\pi}\left[R_{t+1} | S_{t}=s\right]\end{aligned}$ , thus $p(x)$ means $p(r|s)$ , and then:

$\begin{aligned} E_{\pi}=\sum_{r \in R} r \cdot p(r | s) \end{aligned}$

which equals to

$\begin{aligned} \sum_{a} \pi(a | s) \cdot \sum_{r \in R} p(r | s, a) \cdot r \tag{3} \end{aligned}$

Why? $p(r,a)=p(r|a)\cdot p(a)$ , thus $p(r,a|s) = p(r|a,s)\cdot p(a|s)$ . And $p(r|s) = \sum_{a}p(r,a|s) = \sum_{a}p(a|s)\cdot p(r|a,s)$ , in bellman function, $p(a|s) = \pi(a|s)$

Since

$\begin{aligned} p(r | s, a)=\sum_{s^{\prime} \in S} p\left(s^{\prime}, r | s, a\right) \end{aligned}$

Thus we have $\begin{aligned} (3)=\sum_{a} \pi(a | s) \cdot \sum_{r \in R} \cdot \sum_{s^{\prime} \in S} p\left(s^{\prime}, r | s, a\right) \cdot r \end{aligned}$

Bellman Equation V(s) Proof

Why we need to understand it?

Simple Proof

You May Also Enjoy

A Least Squares Approach --Math

Linear Modeling - A Maximum Likelihood Approach --Math

The Bayesian Approach To Machine Learning --Math

An Introduction To ESN --Math

Modeling With NMF And SVD --Math

Bounding Box Regression --Math