HybPlan

Last edited: August 8, 2025

“Can we come up a policy that, if not fast, at least reach the goal!”

Background

Stochastic Shortest-Path

we are at an initial state, and we have a series of goal states, and we want to reach to the goal states.

We can solve this just by:

value iteration
simulate a trajectory and only updating reachable state: RTDP, LRTDP
MBP

Problem

MDP + Goal States

\(S\): set of states
\(A\): actions
\(P(s’|s,a)\): transition
\(C\): reward
\(G\): absorbing goal states

Approach

Combining LRTDP with anytime dynamics

hypothesis testing

Last edited: August 8, 2025

hypothesis testing is the mechanism by which a hypothesis is tested statistically.

The core logic of hypothesis testing: have a metric, do tests, calculate probability that the outcome could have happened given the metric is true.

Examples include

t-test (for sample means)
z-test (for sample proportions)
chi-square test (for sample categories)

Common to all hypothesis tests are the following terms.

null hypothesis

A null hypothesis is a “no difference” hypothesis created as a part of hypothesis testing. It is usually stated as an equality.

IBM704

Last edited: August 8, 2025

The IBM704 is the first mass-produced floating point computation computer (“pretty much the only computer that could handle complex math”); 12,000 floating point additions per second. IBM built 123 such machines between 1955-1960.

ICLR2025 Adaptive Computation

Last edited: August 8, 2025

Talks

ICLR2025 Context and Retrieval

Last edited: August 8, 2025

Talks

ICLR2025 Wu: Retrieval Head Explains Long Context