Houjun Liu


POMDPs with continuous actions are hard. So POMCP or (belief update + MCTS).

So instead, let’s try improving that. Unlike just POMCP, not only do we have \(B(h)\), we also have \(W(h)\), which is the weight of a specific state sampled. Naively applying POMCP on continuous states will give a wide-ass tree because each sampled state will not be the same as before.

double progressive widening

We want to use sampling to sample from observation. This will eventually lead to a suboptimal QMDP policy—this is because there are no state uncertainty?


  1. get an action from ActionProgressiveWiden function
  2. Get an observation, if the observation we got has to many children we prune
  3. discard the observation and stick the next state onto previous observation weighted by the observation likelihood system \(Z(o|s,a,s’)\)

\(k, \alpha, C\)


  1. MCTS
  2. Particle filters
  3. Double Progressive Widening