Houjun Liu

Partially Observable Markov Decision Process

Partially Observable Markov Decision Process is a with .

Components:

    • states
    • actions (given state)
    • transition function (given state and actions)
    • reward function
  • Belief System

As always we desire to find a \(\pi\) such that we can:

\begin{equation} \underset{\pi \in \Pi}{\text{maximize}}\ \mathbb{E} \qty[ \sum_{t=0}^{\infty} \gamma^{t} R(b_{t}, \pi(b_{t}))] \end{equation}

whereby our \(\pi\) instead of taking in a state for input takes in a belief (over possible states) as input.

observation and states

“where are we, and how sure are we about that?”

beliefs and filters

policy representations

“how do we represent a policy”

  • a tree: conditional plan
  • a graph:
  • with utility: +
    • just take the top action of the conditional plan the alpha-vector was computed from

policy evaluations

“how good is our policy / what’s the utility?”

policy solutions

“how do we make that policy better?”

exact solutions

approximate solutions

  • estimate an , and then use a policy representation:
    • upper-bounds for s

    • lower-bounds for s

online solutions

Online POMDP Methods