HSVI
Last edited: August 8, 2025Improving PBVI without sacrificing quality.
Initialization
We first initialize HSVI with a set of alpha vectors \(\Gamma\), representing the lower-bound, and a list of tuples of \((b, U(b))\) named \(\Upsilon\), representing the upper-bound. We call the value functions they generate as \(\bar{V}\) and \(\underline V\).
Lower Bound
Set of alpha vectors: best-action worst-state (HSVI1), blind lower bound (HSVI2)
Calculating \(\underline{V}(b)\)
\begin{equation} \underline{V}_{\Gamma} = \max_{\alpha} \alpha^{\top}b \end{equation}
Upper Bound
- solving fully-observable MDP
- Project \(b\) into the point-set
- Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection)
Calculating \(\bar{V}(b)\)
Recall that though the lower-bound is given by alpha vectors, the upper bound is given in terms of a series of tuples \((b, U(b)) \in \Upsilon\).
Huffman Coding
Last edited: August 8, 2025The Huffman Coding scheme is a coding scheme which gives the best Prefix-Free code (with the smallest Average Codeword Length) for any source with any arbitrary distribution.
Huffman Coding is Bounded
the Huffman Coding has the property that:
\begin{equation} \bar{l} \leq H(x) +1 \end{equation}
So, recall that any prefix free code for a source has at least entropy length; this gives:
\begin{equation} H(x) \leq \bar{\ell} \leq H(x)+1 \end{equation}
for source \(X\), and entropy of \(X\) given by \(H(X)\), and a Average Codeword Length returned by Huffman Coding \(\bar{\ell}\).
Hungarian Method
Last edited: August 8, 2025HybPlan
Last edited: August 8, 2025“Can we come up a policy that, if not fast, at least reach the goal!”
Background
Stochastic Shortest-Path
we are at an initial state, and we have a series of goal states, and we want to reach to the goal states.
We can solve this just by:
- value iteration
- simulate a trajectory and only updating reachable state: RTDP, LRTDP
- MBP
Problem
MDP + Goal States
- \(S\): set of states
- \(A\): actions
- \(P(s’|s,a)\): transition
- \(C\): reward
- \(G\): absorbing goal states
Approach
Combining LRTDP with anytime dynamics
hypothesis testing
Last edited: August 8, 2025hypothesis testing is the mechanism by which a hypothesis is tested statistically.
The core logic of hypothesis testing: have a metric, do tests, calculate probability that the outcome could have happened given the metric is true.
Examples include
- t-test (for sample means)
- z-test (for sample proportions)
- chi-square test (for sample categories)
Common to all hypothesis tests are the following terms.
null hypothesis
A null hypothesis is a “no difference” hypothesis created as a part of hypothesis testing. It is usually stated as an equality.