active listening
Last edited: August 8, 2025active recall
Last edited: August 8, 2025Actor-Critic
Last edited: August 8, 2025Create an approximation of the value function \(U_{\phi}\) using Approximate Value Function, and use Policy Gradient to optimize an monte-carlo tree search policy
AdaOPS
Last edited: August 8, 2025How do you sample particle filters? This doesn’t work for a continuous action space.
Contributions
- Uses KLD sampling—adaptive sampling of particple filters
- “belief packing”—pack similar beliefs together, making observation tree smaller
KLD Sampling
KLD Sampling uses KL Divergence to approximate difference between two probability distributions:
\begin{equation} N \approx \frac{k-1}{2\xi} \qty(1- \frac{2}{9(k-1)} + \sqrt{\frac{2}{9(k-1)}} z_{1-\eta})^{3} \end{equation}
“Propagation”
We want to get a set of sampled observations from belief + action.
Belief Packing
L1 norm between beliefs. If its too small consider them the same beliefs.
adaptive importance sampling
Last edited: August 8, 2025Some more improvements to Importance Sampling.
Cross Entropy Method
- draw initial samples
- fit a new distribution with the subset that failed: weight each sample by
\begin{equation} w\qty(\tau) = \frac{p\qty(\tau) \qty {\tau \not \in \psi}}{q\qty(\tau)} \end{equation}
problem: what if, immediately on the first proposal, we never got any failures? Then the weight of everything is zero and then life is bad.
adaptive cross entropy method with adaptive importance sampling
Pick a notion of “distance to failure” \(f\qty(\tau)\)