\begin{align} \tau = s_1 \dots s_{n} \end{align}
\begin{equation} a \sim \pi_{\text{lm}}\qty(\tau) \end{equation}
\begin{equation} a_{i}, \rho_{i} \sim \pi_{\text{lm}}\qty(\tau \mid \rho_{i-1} \dots \rho_{1}) \end{equation}
“action selection”
Tracking in \(\rho\)
- Things that can go into \(\rho\) (explanation “why did we do this”)
- Running summary of the entire \(\tau\) <- good chance this avoids the entire \(\tau\)
why don’t se
Threads
- embeddings being bad seems weird
