Posts

POMCPOW

Last edited: August 8, 2025

POMDPs with continuous actions are hard. So POMCP or (belief update + MCTS).

So instead, let’s try improving that. Unlike just POMCP, not only do we have \(B(h)\), we also have \(W(h)\), which is the weight of a specific state sampled. Naively applying POMCP on continuous states will give a wide-ass tree because each sampled state will not be the same as before.

double progressive widening

We want to use sampling to sample from observation. This will eventually lead to a suboptimal QMDP policy—this is because there are no state uncertainty?

POMDP Approximation

Last edited: August 8, 2025

Upper bounds of alpha vectors

QMDP and FIB represents an upper bound of the true optimal alpha vector values.

FIB is a generally lower bound than QMDP.

Lower bounds of alpha vectors

BAWS and blind lower bound represents

Faster:

Slower:

point selection

see point selection

POMDP-lite

Last edited: August 8, 2025

What if our initial state never change or is deterministically changing? For instance, say, for localization. This should make solving a POMDP easier.

POMDP-lite

  • \(X\) fully observable states
  • \(\theta\) hidden parameter: finite amount of values \(\theta_{1 \dots N}\)
  • where \(S = X \times \theta\)

we then assume conditional independence between \(x\) and \(\theta\). So: \(T = P(x’|\theta, x, a)\), where \(P(\theta’|\theta,x,a) = 1\) (“our hidden parameter is known or deterministically changing”)

POMDPs Index

Last edited: August 8, 2025

a class about POMDPs

ThemeTopics
Robot dogsNeBula, AISR NeBula
ApplicationsPOMDSoar,
Offline SolversPBVI, HSVI, Perseus
Offline SolversSARSOP, E-PCA, CALP
Policy GraphsHansen, MCVI, PGA
Online SolversAEMS, POMCP, DESPOT
Moar Online MethodsIS-DESPOT, POMCPOW, AdaOPS
POMDPishMOMDP, POMDP-lite, rho-POMDPs
Memoryless + Policy SearchSarsa (Lambda), JSJ, Pegasus
Hierarchical DecompositionOption, MaxQ, LTRDP
Hybrid PlanningHybPlan, LetsDrive, BetaZero
LQR + Shared AutonomyiLQR, Hindsight, TrustPOMDP
Multi-AgentFactored MDPs, FV-POMCPs, G-DICE

Other Content

Population Based Training

Last edited: August 8, 2025

Jury based work