Posts

lossless compression

Last edited: August 8, 2025

lossy compression

Last edited: August 8, 2025

lottery

Last edited: August 8, 2025

A lottery is a choice problem, where each outcome has a certain probability:

\begin{equation} [S_1:p_1, \dots, S_{n}:p_{n}] \end{equation}

where, \(S_{j}\) has \(p_{j}\) chance of occurring.

utility of a lottery

For a lottery, the utility thereof is the probability of a state happening times the utility of the state:

that is,

\begin{equation} U([S_1:p_1, \dots, S_{n}:p_{n}]) = \sum_{i=1}^{N} p_{i}U(S_{i})} \end{equation}

Lp Norm

Last edited: August 8, 2025

We have:

\begin{equation} || x ||_{p} = \qty(| x_1 |^{p} + \dots + | x_{n} |^{p})^{\frac{1}{p}} \end{equation}

LRTDP

Last edited: August 8, 2025

Real-Time Dynamic Programming

RTDP is a asynchronous value iteration scheme. Each RTDP trial is a result of:

\begin{equation} V(s) = \min_{ia \in A(s)} c(a,s) + \sum_{s’ \in S}^{} P_{a}(s’|s)V(s) \end{equation}

the algorithm halts when the residuals are sufficiently small.

Labeled RTDP

We want to label converged states so we don’t need to keep investigate it:

a state is solved if:

  • state has less then \(\epsilon\)
  • all reachable states given \(s’\) from this state has residual lower than \(\epsilon\)

Labelled RTDP

We stochastically simulate one step forward, and until a state we haven’t marked as “solved” is met, then we simulate forward and value iterate