Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
229 MDP notation
\(S\) (state), \(A\) (actions), \(P_{(s,a)}\qty(s’) = T\qty(s’ | s,a)\) , \(\gamma\) (discount), \(R\qty(s,a)\).
FUN FACT: discount factors \(< 1\) makes value iteration converge.
\begin{equation} V^{\pi}\qty(s) = \mathbb{E}\qty [R\qty(s_{0},a_{0}) + \gamma R\qty(s_{1}, a_{1}) + \gamma^{2} \dots] \end{equation}
\begin{equation} V^{\pi} \qty(s) = R\qty(s) + \gamma \sum_{s’}^{} P_{s,\pi\qty(s)}\qty(s’) V^{\pi}\qty(s’) \end{equation}
\begin{equation} V^{*}\qty(s) = \max_{\pi} V^{\pi}\qty(s) \end{equation}
What if we don’t know the transitions? Just learn the transitions! exportation exploitation.
