Houjun Liu

blind lower bound

To evaluate the lower bound:

\begin{equation} \alpha_{a}^{k+1} (s) = R(s,a) + \gamma \sum_{s’}^{} T(s’|s,a) \alpha_{a}^{k}(s’) \end{equation}

we are essentially sticking with an action and do conditional plan evaluation of a policy that do one action into the future