HSVI

Last edited: August 8, 2025

Improving PBVI without sacrificing quality.

Initialization

We first initialize HSVI with a set of alpha vectors \(\Gamma\), representing the lower-bound, and a list of tuples of \((b, U(b))\) named \(\Upsilon\), representing the upper-bound. We call the value functions they generate as \(\bar{V}\) and \(\underline V\).

Lower Bound

Set of alpha vectors: best-action worst-state (HSVI1), blind lower bound (HSVI2)

Calculating \(\underline{V}(b)\)

\begin{equation} \underline{V}_{\Gamma} = \max_{\alpha} \alpha^{\top}b \end{equation}

Upper Bound

Fast Informed Bound

solving fully-observable MDP
Project \(b\) into the point-set
Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection)

Calculating \(\bar{V}(b)\)

Recall that though the lower-bound is given by alpha vectors, the upper bound is given in terms of a series of tuples \((b, U(b)) \in \Upsilon\).

Huffman Coding

Last edited: August 8, 2025

The Huffman Coding scheme is a coding scheme which gives the best Prefix-Free code (with the smallest Average Codeword Length) for any source with any arbitrary distribution.

Huffman Coding is Bounded

the Huffman Coding has the property that:

\begin{equation} \bar{l} \leq H(x) +1 \end{equation}

So, recall that any prefix free code for a source has at least entropy length; this gives:

\begin{equation} H(x) \leq \bar{\ell} \leq H(x)+1 \end{equation}

for source \(X\), and entropy of \(X\) given by \(H(X)\), and a Average Codeword Length returned by Huffman Coding \(\bar{\ell}\).

Hungarian Method

Last edited: August 8, 2025

HybPlan

Last edited: August 8, 2025

“Can we come up a policy that, if not fast, at least reach the goal!”

Background

Stochastic Shortest-Path

we are at an initial state, and we have a series of goal states, and we want to reach to the goal states.

We can solve this just by:

value iteration
simulate a trajectory and only updating reachable state: RTDP, LRTDP
MBP

Problem

MDP + Goal States

\(S\): set of states
\(A\): actions
\(P(s’|s,a)\): transition
\(C\): reward
\(G\): absorbing goal states

Approach

Combining LRTDP with anytime dynamics

hypothesis testing

Last edited: August 8, 2025

hypothesis testing is the mechanism by which a hypothesis is tested statistically.

The core logic of hypothesis testing: have a metric, do tests, calculate probability that the outcome could have happened given the metric is true.

Examples include

t-test (for sample means)
z-test (for sample proportions)
chi-square test (for sample categories)

Common to all hypothesis tests are the following terms.

null hypothesis

A null hypothesis is a “no difference” hypothesis created as a part of hypothesis testing. It is usually stated as an equality.