# Bellman Equation V(s) Proof

## Why we need to understand it?

Bellman equation is a key point  for understanding reinforcement learning, however, I didn’t find any materials that write the proof for it. In this post, I will show you how to prove it easily.

## Simple Proof

For all $s \in \mathcal{S}:$

\begin{aligned}v_{\pi}(s)\doteq\mathbb{E}_{\pi}\left[G_{t}|S_{t}=s\right] \end{aligned}

\begin{aligned} \tag{1} =\mathbb{E}_{\pi}\left[R_{t+1}+\gamma G_{t+1} | S_{t}=s\right] \end{aligned}

equals to \begin{aligned} \sum_{a} \pi(a | s) \sum_{s^{\prime}} \sum_{r} p\left(s^{\prime}, r | s, a\right)\left[r+\gamma \mathbb{E}{\pi}\left[G_{t+1} | S_{t+1}=s^{\prime}\right]\right] \tag{2} \end{aligned}

\begin{aligned} =\sum_{a} \pi(a | s) \sum_{s^{\prime}, r} p\left(s^{\prime}, r | s, a\right)\left[r+\gamma v_{\pi}\left(s^{\prime}\right)\right] \end{aligned}

That is: $v_{\pi}(s)=\sum_{a \in \mathcal{A}} \pi(a | s)\left(\mathcal{R}_{s}^{a}+\gamma \sum_{s^{\prime} \in \mathcal{S}} \mathcal{P}_{s s^{\prime}}^{a} v_{\pi}\left(s^{\prime}\right)\right)$

We need to prove $(1)$ to $(2)$

In this example, I just improve this aligned:

\begin{aligned} E_{\pi}\left[R_{t+1} | S_{k}=s\right] \end{aligned}

Relationships

For general situation, \begin{aligned} E=\sum_{x \in X} x \cdot p(x)\end{aligned}

For this question, the target is to get \begin{aligned} \mathbb{E}_{\pi}\left[R_{t+1} | S_{t}=s\right]\end{aligned}, thus $p(x)$ means $p(r|s)$, and then:

\begin{aligned} E_{\pi}=\sum_{r \in R} r \cdot p(r | s) \end{aligned}

which equals to

\begin{aligned} \sum_{a} \pi(a | s) \cdot \sum_{r \in R} p(r | s, a) \cdot r \tag{3} \end{aligned}

Why? $p(r,a)=p(r|a)\cdot p(a)$, thus $p(r,a|s) = p(r|a,s)\cdot p(a|s)$. And $p(r|s) = \sum_{a}p(r,a|s) = \sum_{a}p(a|s)\cdot p(r|a,s)$, in bellman function, $p(a|s) = \pi(a|s)$

Since

\begin{aligned} p(r | s, a)=\sum_{s^{\prime} \in S} p\left(s^{\prime}, r | s, a\right) \end{aligned}

Thus we have \begin{aligned} (3)=\sum_{a} \pi(a | s) \cdot \sum_{r \in R} \cdot \sum_{s^{\prime} \in S} p\left(s^{\prime}, r | s, a\right) \cdot r \end{aligned}

Welcome to share or comment on this post: