## Motivation

Large crowd navigation with sudden changes: unlikely events are out of likely sample. So, we want to bring in another distribution based on **importance** and not **likelyness**.

## Goals

## DESPOT with Importance Sampling

- take our initial belief
- sample trajectories according to Importance Sampling distribution
- calculate values of those states
- obtain value estimate based on weighted average of the values

### Importance Sampling of trajectories

We define an importance distribution of some trajectory \(\xi\):

\begin{equation} q(\xi | b,\pi) = q(s_0) \prod_{t=0}^{D} q(s_{t+1}, o_{t+1} | s_{t}, a_{t+1}) \end{equation}

## Background

### Importance Sampling

Suppose you have a function \(f(s)\) which isn’t super well integrate-able, yet you want:

\begin{equation} \mu = \mathbb{E}(f(s)) = \int_{0}^{1} f(s)p(s) \dd{s} \end{equation}

how would you sample various \(f(s)\) effectively such that you end up with \(\hat{\mu}\) that’s close enough?

Well, what if you have an importance distribution \(q(s): S \to \mathbb{R}^{[0,1]}\), which tells you how “important” to the expected value of the distribution a particular state is? Then, we can formulate a new, better normalization function called the “importance weight”:

\begin{equation} w(s) = \frac{p(s)}{q(s)} \end{equation}

Therefore, this would make our estimator:

\begin{equation} \hat{\mu} = \frac{\sum_{n} f(s_{n}) w(s_{n})}{\sum_{n} w(s_{n})} \end{equation}

#### Theoretic grantees

So, there’s a distribution over \(f\):

\begin{equation} q(s) = \frac{b(s)}{w_{\pi}(s)} \end{equation}

where

\begin{equation} w(s) = \frac{\mathbb{E}_{b} \qty( \sqrt{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]})}{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]} \end{equation}

which measures how important a state is, where \(\pi\) is the total discounted reward.