active listening

Last edited: August 8, 2025

active recall

Last edited: August 8, 2025

Last edited: August 8, 2025

Create an approximation of the value function \(U_{\phi}\) using Approximate Value Function, and use Policy Gradient to optimize an monte-carlo tree search policy

Last edited: August 8, 2025

How do you sample particle filters? This doesn’t work for a continuous action space.

KLD Sampling uses KL Divergence to approximate difference between two probability distributions:

\begin{equation} N \approx \frac{k-1}{2\xi} \qty(1- \frac{2}{9(k-1)} + \sqrt{\frac{2}{9(k-1)}} z_{1-\eta})^{3} \end{equation}

We want to get a set of sampled observations from belief + action.

L1 norm between beliefs. If its too small consider them the same beliefs.

Last edited: August 8, 2025

Some more improvements to Importance Sampling.

\begin{equation} w\qty(\tau) = \frac{p\qty(\tau) \qty {\tau \not \in \psi}}{q\qty(\tau)} \end{equation}

problem: what if, immediately on the first proposal, we never got any failures? Then the weight of everything is zero and then life is bad.

Pick a notion of “distance to failure” \(f\qty(\tau)\)