Houjun Liu

SARSOP

Big problem: curse of dimensionality and the curse of history.

PBVI and HSVI tries to sample the belief simplex generally. But instead we should try to sample OPTIMAL REACHABLE SET.

Background

Recall one-step lookahead in POMDP. The difficulty here is that the sum over all of the alpha-vectors is still very hard. So, in PBVI, we only do this to a small set of beliefs

SARSOP

  • sample \(R^{*}\)
  • backup
  • prune

Initialization

choose an initial belief, action, and observation using “suitable heuristics”. Initialize a set of alpha vectors corresponding to this belief.

Sampling

  • compute \(b’ = update(b,a,o)\)
  • add node \(b’\) to the tree

So far, this is just PBVI, HSVI. The point is that we only want to update the reachable set.

To do this, we now take the new \(b’\), we give an upper bound via FIB, and a lower bound with blind lower bound over the alpha vectors you already got.

Now:

where \(\mathcal{R}^{*}\) is a reachable space tree set from \(b_0\).

Backup

PBVI Backup on the beliefs you sampled to update your alpha vectors.

Pruning

We can prune anything that’s suboptimal: every step, we perform alpha vector pruning at every step.

Limitations

HSVI is better at handling systems with lower uncertainty.

  • Does not make an attempt at challenges of dimensionality
  • Make unproven theoretical claims
  • Don’t compare to domain contraction
  • Compare algorithm to a single alternative
  • Compared to continuous state spaces
  • Subsection headings