Houjun Liu

IS-DESPOT

Motivation

Large crowd navigation with sudden changes: unlikely events are out of likely sample. So, we want to bring in another distribution based on importance and not likelyness.

Goals

DESPOT with Importance Sampling

  1. take our initial belief
  2. sample trajectories according to Importance Sampling distribution
  3. calculate values of those states
  4. obtain value estimate based on weighted average of the values

Importance Sampling of trajectories

We define an importance distribution of some trajectory \(\xi\):

\begin{equation} q(\xi | b,\pi) = q(s_0) \prod_{t=0}^{D} q(s_{t+1}, o_{t+1} | s_{t}, a_{t+1}) \end{equation}

Background

Importance Sampling

Suppose you have a function \(f(s)\) which isn’t super well integrate-able, yet you want:

\begin{equation} \mu = \mathbb{E}(f(s)) = \int_{0}^{1} f(s)p(s) \dd{s} \end{equation}

how would you sample various \(f(s)\) effectively such that you end up with \(\hat{\mu}\) that’s close enough?

Well, what if you have an importance distribution \(q(s): S \to \mathbb{R}^{[0,1]}\), which tells you how “important” to the expected value of the distribution a particular state is? Then, we can formulate a new, better normalization function called the “importance weight”:

\begin{equation} w(s) = \frac{p(s)}{q(s)} \end{equation}

Therefore, this would make our estimator:

\begin{equation} \hat{\mu} = \frac{\sum_{n} f(s_{n}) w(s_{n})}{\sum_{n} w(s_{n})} \end{equation}

Theoretic grantees

So, there’s a distribution over \(f\):

\begin{equation} q(s) = \frac{b(s)}{w_{\pi}(s)} \end{equation}

where

\begin{equation} w(s) = \frac{\mathbb{E}_{b} \qty( \sqrt{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]})}{[\mathbb{E}(v|s, \pi )]^{2} + [Var(v|s, \pi )]} \end{equation}

which measures how important a state is, where \(\pi\) is the total discounted reward.