POMCPOW
Last edited: August 8, 2025POMDPs with continuous actions are hard. So POMCP or (belief update + MCTS).
So instead, let’s try improving that. Unlike just POMCP, not only do we have \(B(h)\), we also have \(W(h)\), which is the weight of a specific state sampled. Naively applying POMCP on continuous states will give a wide-ass tree because each sampled state will not be the same as before.
double progressive widening
We want to use sampling to sample from observation. This will eventually lead to a suboptimal QMDP policy—this is because there are no state uncertainty?
POMDP Approximation
Last edited: August 8, 2025Upper bounds of alpha vectors
QMDP and FIB represents an upper bound of the true optimal alpha vector values.
FIB is a generally lower bound than QMDP.
Lower bounds of alpha vectors
BAWS and blind lower bound represents
Faster:
Slower:
point selection
see point selection
POMDP-lite
Last edited: August 8, 2025What if our initial state never change or is deterministically changing? For instance, say, for localization. This should make solving a POMDP easier.
POMDP-lite
- \(X\) fully observable states
- \(\theta\) hidden parameter: finite amount of values \(\theta_{1 \dots N}\)
- where \(S = X \times \theta\)
we then assume conditional independence between \(x\) and \(\theta\). So: \(T = P(x’|\theta, x, a)\), where \(P(\theta’|\theta,x,a) = 1\) (“our hidden parameter is known or deterministically changing”)
POMDPs Index
Last edited: August 8, 2025a class about POMDPs
Theme | Topics |
---|---|
Robot dogs | NeBula, AISR NeBula |
Applications | POMDSoar, |
Offline Solvers | PBVI, HSVI, Perseus |
Offline Solvers | SARSOP, E-PCA, CALP |
Policy Graphs | Hansen, MCVI, PGA |
Online Solvers | AEMS, POMCP, DESPOT |
Moar Online Methods | IS-DESPOT, POMCPOW, AdaOPS |
POMDPish | MOMDP, POMDP-lite, rho-POMDPs |
Memoryless + Policy Search | Sarsa (Lambda), JSJ, Pegasus |
Hierarchical Decomposition | Option, MaxQ, LTRDP |
Hybrid Planning | HybPlan, LetsDrive, BetaZero |
LQR + Shared Autonomy | iLQR, Hindsight, TrustPOMDP |
Multi-Agent | Factored MDPs, FV-POMCPs, G-DICE |
Other Content
Population Based Training
Last edited: August 8, 2025Jury based work