Posts

bending

Last edited: August 8, 2025

Bending is what happens when you apply a transverse load to an object and it goes wooosh.

That’s cool. Now how does it work? see Euler-Bernoulli Theory

Bernoulli distribution

Last edited: August 8, 2025

Consider a case where there’s only a single binary outcome:

  • “success”, with probability \(p\)
  • “failure”, with probability \(1-p\)

constituents

\begin{equation} X \sim Bern(p) \end{equation}

requirements

the probability mass function:

\begin{equation} P(X=k) = \begin{cases} p,\ if\ k=1\\ 1-p,\ if\ k=0\\ \end{cases} \end{equation}

This is sadly not Differentiable, which is sad for Maximum Likelihood Parameter Learning. Therefore, we write:

\begin{equation} P(X=k) = p^{k} (1-p)^{1-k} \end{equation}

Which emulates the behavior of your function at \(0\) and \(1\) and we kinda don’t care any other place.

Bessel's Equation

Last edited: August 8, 2025

\begin{equation} x^{2}y’’ + xy’ + (x^{2}-n^{2})y = 0 \end{equation}

this function is very useful, they have no well defined elementary result.

best-action worst-state

Last edited: August 8, 2025

best-action worst-state is a lower bound for alpha vectors:

\begin{equation} r_{baws} = \max_{a} \sum_{k=1}^{\infty} \gamma^{k-1} \min_{s}R(s,a) \end{equation}

The alpha vector corresponding to this system would be the same \(r_{baws}\) at each slot.

which should give us the highest possible reward possible given we always pick the most optimal actions while being stuck in the worst state

BetaZero

Last edited: August 8, 2025

Background

recall AlphaZero

  1. Selection (UCB 1, or DTW, etc.)
  2. Expansion (generate possible belief notes)
  3. Simulation (if its a brand new node, Rollout, etc.)
  4. Backpropegation (backpropegate your values up)

Key Idea

Remove the need for heuristics for MCTS—removing inductive bias

Approach

We keep the ol’ neural network:

\begin{equation} f_{\theta}(b_{t}) = (p_{t}, v_{t}) \end{equation}

Policy Evaluation

Do \(n\) episodes of MCTS, then use cross entropy to improve \(f\)

Ground truth policy

Action Selection

Uses Double Progressive Widening