Motivation
Large crowd navigation with sudden changes: unlikely events are out of likely sample. So, we want to bring in another distribution based on importance and not likelyness.
Goals
DESPOT with Importance Sampling
- take our initial belief
- sample trajectories according to Importance Sampling distribution
- calculate values of those states
- obtain value estimate based on weighted average of the values
Importance Sampling of trajectories
We define an importance distribution of some trajectory \(\xi\):
\begin{equation} q(\xi | b,\pi) = q(s_0) \prod_{t=0}^{D} q(s_{t+1}, o_{t+1} | s_{t}, a_{t+1}) \end{equation}