SU-CS229 NOV062025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

229 MDP notation

\(S\) (state), \(A\) (actions), \(P_{(s,a)}\qty(s’) = T\qty(s’ | s,a)\) , \(\gamma\) (discount), \(R\qty(s,a)\).

FUN FACT: discount factors \(< 1\) makes value iteration converge.

\begin{equation} V^{\pi}\qty(s) = \mathbb{E}\qty [R\qty(s_{0},a_{0}) + \gamma R\qty(s_{1}, a_{1}) + \gamma^{2} \dots] \end{equation}

\begin{equation} V^{\pi} \qty(s) = R\qty(s) + \gamma \sum_{s’}^{} P_{s,\pi\qty(s)}\qty(s’) V^{\pi}\qty(s’) \end{equation}

\begin{equation} V^{*}\qty(s) = \max_{\pi} V^{\pi}\qty(s) \end{equation}

What if we don’t know the transitions? Just learn the transitions! exportation exploitation.