POMCPOW

Last edited: August 8, 2025

POMDPs with continuous actions are hard. So POMCP or (belief update + MCTS).

So instead, let’s try improving that. Unlike just POMCP, not only do we have \(B(h)\), we also have \(W(h)\), which is the weight of a specific state sampled. Naively applying POMCP on continuous states will give a wide-ass tree because each sampled state will not be the same as before.

double progressive widening

We want to use sampling to sample from observation. This will eventually lead to a suboptimal QMDP policy—this is because there are no state uncertainty?

POMDP Approximation

Last edited: August 8, 2025

Upper bounds of alpha vectors

QMDP and FIB represents an upper bound of the true optimal alpha vector values.

FIB is a generally lower bound than QMDP.

Lower bounds of alpha vectors

BAWS and blind lower bound represents

Faster:

Slower:

Point-Based Value Iteration
- “Perseus”: Randomized PBVI
HSVI
SARSOP

point selection

see point selection

POMDP-lite

Last edited: August 8, 2025

What if our initial state never change or is deterministically changing? For instance, say, for localization. This should make solving a POMDP easier.

POMDP-lite

\(X\) fully observable states
\(\theta\) hidden parameter: finite amount of values \(\theta_{1 \dots N}\)
where \(S = X \times \theta\)

we then assume conditional independence between \(x\) and \(\theta\). So: \(T = P(x’|\theta, x, a)\), where \(P(\theta’|\theta,x,a) = 1\) (“our hidden parameter is known or deterministically changing”)

POMDPs Index

Last edited: August 8, 2025

a class about POMDPs

Theme	Topics
Robot dogs	NeBula, AISR NeBula
Applications	POMDSoar,
Offline Solvers	PBVI, HSVI, Perseus
Offline Solvers	SARSOP, E-PCA, CALP
Policy Graphs	Hansen, MCVI, PGA
Online Solvers	AEMS, POMCP, DESPOT
Moar Online Methods	IS-DESPOT, POMCPOW, AdaOPS
POMDPish	MOMDP, POMDP-lite, rho-POMDPs
Memoryless + Policy Search	Sarsa (Lambda), JSJ, Pegasus
Hierarchical Decomposition	Option, MaxQ, LTRDP
Hybrid Planning	HybPlan, LetsDrive, BetaZero
LQR + Shared Autonomy	iLQR, Hindsight, TrustPOMDP
Multi-Agent	Factored MDPs, FV-POMCPs, G-DICE

Population Based Training

Last edited: August 8, 2025

Jury based work

POMCPOW

double progressive widening

POMDP Approximation

Upper bounds of alpha vectors

Lower bounds of alpha vectors

point selection

POMDP-lite

POMDP-lite

POMDPs Index

Other Content

Population Based Training