continuity correct

Last edited: August 8, 2025

continuity correction

Last edited: August 8, 2025

Because we want to including rounding during continuity correction to account for things discretized to certain values.

Discrete	Continuous
P(X = 6)	P( 5.5 <= X <= 6.5)
P(X >= 6)	P (X >= 5.5)
P(X > 6)	P (X >= 6.5)

basically “less than

continuous distribution

Last edited: August 8, 2025

This is a continuous distribution for which the probability can be quantified as:

\begin{equation} p(x) \dd{x} \end{equation}

You will note that, at any given exact point, the probability is \(\lim_{\dd{x} \to 0} p(x)\dd{x} = 0\). However, to get the actual probability, we take an integral over some range:

\begin{equation} \int_{-\infty}^{\infty} p(x) \dd{x} = 1 \end{equation}

See also cumulative distribution function which represents the chance of something happening up to a threshold.

controller

Last edited: August 8, 2025

a controller is a that maintains its own state.

constituents

\(X\): a set of nodes (hidden, internal states)
\(\Psi(a|x)\): probability of taking an action
\(\eta(x’|x,a,o)\) : transition probability between hidden states

requirements

Controllers are nice because we:

don’t have to maintain a belief over time: we need an initial belief, and then we can create beliefs as we’d like without much worry
controllers can be made shorter than conditional plans

additional information

finite state controller

A finite state controller has a finite amount of hidden internal state.

controller gradient ascent

Last edited: August 8, 2025

We aim to solve for a fixed-sized controller based policy using gradient ascent. This is the unconstrained variation on PGA.

Recall that we seek to optimize, for some initial node \(x^{(1)}\) and belief-state \(b\), we want to find the distribution of actions and transitions \(\Psi\) and \(\eta\), which maximizes the utility we can obtain based on initial state:

\begin{equation} \sum_{s}b(s) U(x^{(1)}, s) \end{equation}

Recall that \(U(x,s)\) is given by:

\begin{equation} U(x, s) = \sum_{a}^{} \Psi(a|x) \qty[R(s,a) + \gamma \qty(\sum_{s’}^{} T(s’|s,a) \sum_{o}^{} O(o|a, s’) \sum_{x’}^{} \eta(x’|x,a,o) U(x’, s’)) ] \end{equation}