Houjun Liu

MOMDP

MOMDP are POMDPs where some parts of the state are fully observable.


Motivation

scaling up POMDPs is really hard: exponential curse of dimensionality. Even discretization will cause the number of beliefs to really blow up.

Some of the state isn’t uncertain, some others are bounded uncertainty: this REDUCES scale a lot.

Solving

Solving the algorithm uses SARSOP, or any point-based system. Instead of sampling the full belief state, however, we sample from a tuple \((x, b_{y})\), whereby \(x\) is the observable part and \(b_{y}\) is the unobservable part.

How Exactly Tuple?

True Mixed Observability

Go about splitting about your space based on the true observability part. Say there are \(10\) states which are observable, you literally just initialize 10 sets of alpha vectors to create \(V_{1} … V_{10}\) for your observable states, where each one you have:

\begin{equation} V_{x_{i}}(b_{j}) = \dots \end{equation}

whereby all of your objectives and backup, etc., takes \(x\) your observable state as input. Then, during inference/backup looking at where you are in the observable part and use the value function from that part.

Pseudo-Full Observability

Train a fully observable model, and then use belief-weighted average during inference. This is where QMDP came from.

Bounded Uncertainty

Most of the time neither of the top two cases apply cleanly. Instead, most frequently your uncertainty in your observation is bounded by a significant degree.

Condition

For instance, your GPS maybe uncertain, but if it says you are in Kansas you are not in Shanghai. Formally, for \(h: O \to S\) (the hypothetical “preimage” of any observation), we expect that:

\begin{equation} \frac{h(o)}{S} = c \end{equation}

gives \(c \ll 1\).

Solution

If we know we have Bounded Uncertainty, we can reparameterize our POMDP to an MDP over observations (we call this \(X\)) plus a POMDP modeling uncertainty in offsets from those observations (we call this \(Y\)).

Whereby:

\begin{equation} \begin{cases} T_{x}(x’|x,y,a) = \sum_{s’ \in S} T(s’ | (x,y),a) O(x’|s’,a) \\ T_{y}(y’|x,x’,y,a) = \frac{T((x’,y’) | (x,y),a) O((x’,y’)|s’,a)}{T_{x}(x’|x,y,a)} \end{cases} \end{equation}

where our state space is now split into \(s \in S = X \times Y\) s.t. \(s=(x,y)\).