Houjun Liu


Big problem: curse of dimensionality and the curse of history.

PBVI and HSVI tries to sample the belief simplex generally. But instead we should try to sample OPTIMAL REACHABLE SET.


Recall one-step lookahead in POMDP. The difficulty here is that the sum over all of the alpha-vectors is still very hard. So, in PBVI, we only do this to a small set of beliefs


  • sample \(R^{*}\)
  • backup
  • prune


choose an initial belief, action, and observation using “suitable heuristics”. Initialize a set of alpha vectors corresponding to this belief.


  • compute \(b’ = update(b,a,o)\)
  • add node \(b’\) to the tree

So far, this is just PBVI, HSVI. The point is that we only want to update the reachable set.

To do this, we now take the new \(b’\), we give an upper bound via FIB, and a lower bound with blind lower bound over the alpha vectors you already got.


where \(\mathcal{R}^{*}\) is a reachable space tree set from \(b_0\).


PBVI Backup on the beliefs you sampled to update your alpha vectors.


We can prune anything that’s suboptimal: every step, we perform alpha vector pruning at every step.


HSVI is better at handling systems with lower uncertainty.

  • Does not make an attempt at challenges of dimensionality
  • Make unproven theoretical claims
  • Don’t compare to domain contraction
  • Compare algorithm to a single alternative
  • Compared to continuous state spaces
  • Subsection headings