MaxQ

Last edited: August 8, 2025

Two Abstractions

“temporal abstractions”: making decisions without consideration / abstracting away time (MDP)
“state abstractions”: making decisions about groups of states at once

Graph

MaxQ formulates a policy as a graph, which formulates a set of \(n\) policies

Max Node

This is a “policy node”, connected to a series of \(Q\) nodes from which it takes the max and propegate down. If we are at a leaf max-node, the actual action is taken and control is passed back t to the top of the graph

MBP

Last edited: August 8, 2025

MCVI

Last edited: August 8, 2025

MCVI solves POMDPs with continuous state space, but with discrete observation and action spaces. It does this by formulating a POMDP as a graph.

Fast algorithms require discretized state spaces, which makes the problem much more difficult to model. MCVI makes continuous representations possible for complex domains.

MC Backup

Normal POMDP Bellman Backup isn’t going to work well with continuous state spaces.

Therefore, we reformulate our value backup as:

\begin{equation} V_{t+1}(b) = \max_{a \in A} \qty(\int_{s} R(s,a)b(s) \dd{s}) + \gamma \sum_{o \in O}^{} p(o|b,a) V_{t}(update(b,a,o)) \end{equation}

meal replacement

Last edited: August 8, 2025

mean average precision

Last edited: August 8, 2025

at each point a relevant result is returned, calculate precision
and then average that
and then average the precision over all queries

precision

\begin{equation} \frac{tp}{tp + fp} \end{equation}

recall

\begin{equation} \frac{tp}{tp+fn} \end{equation}

accuracy

\begin{equation} \frac{tp + tn}{tp+tn+fp+fn} \end{equation}

f1

\begin{equation} F_1 = \frac{2 (P\cdot R)}{P+R} \end{equation}