Posts

active listening

Last edited: August 8, 2025

active recall

Last edited: August 8, 2025

Actor-Critic

Last edited: August 8, 2025

Create an approximation of the value function \(U_{\phi}\) using Approximate Value Function, and use Policy Gradient to optimize an monte-carlo tree search policy

AdaOPS

Last edited: August 8, 2025

How do you sample particle filters? This doesn’t work for a continuous action space.

Contributions

  • Uses KLD sampling—adaptive sampling of particple filters
  • “belief packing”—pack similar beliefs together, making observation tree smaller

KLD Sampling

KLD Sampling uses KL Divergence to approximate difference between two probability distributions:

\begin{equation} N \approx \frac{k-1}{2\xi} \qty(1- \frac{2}{9(k-1)} + \sqrt{\frac{2}{9(k-1)}} z_{1-\eta})^{3} \end{equation}

“Propagation”

We want to get a set of sampled observations from belief + action.

Belief Packing

L1 norm between beliefs. If its too small consider them the same beliefs.

adaptive importance sampling

Last edited: August 8, 2025

Some more improvements to Importance Sampling.

Cross Entropy Method

  1. draw initial samples
  2. fit a new distribution with the subset that failed: weight each sample by

\begin{equation} w\qty(\tau) = \frac{p\qty(\tau) \qty {\tau \not \in \psi}}{q\qty(\tau)} \end{equation}

problem: what if, immediately on the first proposal, we never got any failures? Then the weight of everything is zero and then life is bad.

adaptive cross entropy method with adaptive importance sampling

Pick a notion of “distance to failure” \(f\qty(\tau)\)