continuity correct
Last edited: August 8, 2025continuity correction
Last edited: August 8, 2025Because we want to including rounding during continuity correction to account for things discretized to certain values.
Discrete | Continuous |
---|---|
P(X = 6) | P( 5.5 <= X <= 6.5) |
P(X >= 6) | P (X >= 5.5) |
P(X > 6) | P (X >= 6.5) |
basically “less than
continuous distribution
Last edited: August 8, 2025This is a continuous distribution for which the probability can be quantified as:
\begin{equation} p(x) \dd{x} \end{equation}
You will note that, at any given exact point, the probability is \(\lim_{\dd{x} \to 0} p(x)\dd{x} = 0\). However, to get the actual probability, we take an integral over some range:
\begin{equation} \int_{-\infty}^{\infty} p(x) \dd{x} = 1 \end{equation}
See also cumulative distribution function which represents the chance of something happening up to a threshold.
controller
Last edited: August 8, 2025a controller is a that maintains its own state.
constituents
- \(X\): a set of nodes (hidden, internal states)
- \(\Psi(a|x)\): probability of taking an action
- \(\eta(x’|x,a,o)\) : transition probability between hidden states
requirements
Controllers are nice because we:
- don’t have to maintain a belief over time: we need an initial belief, and then we can create beliefs as we’d like without much worry
- controllers can be made shorter than conditional plans
additional information
finite state controller
A finite state controller has a finite amount of hidden internal state.
controller gradient ascent
Last edited: August 8, 2025We aim to solve for a fixed-sized controller based policy using gradient ascent. This is the unconstrained variation on PGA.
Recall that we seek to optimize, for some initial node \(x^{(1)}\) and belief-state \(b\), we want to find the distribution of actions and transitions \(\Psi\) and \(\eta\), which maximizes the utility we can obtain based on initial state:
\begin{equation} \sum_{s}b(s) U(x^{(1)}, s) \end{equation}
Recall that \(U(x,s)\) is given by:
\begin{equation} U(x, s) = \sum_{a}^{} \Psi(a|x) \qty[R(s,a) + \gamma \qty(\sum_{s’}^{} T(s’|s,a) \sum_{o}^{} O(o|a, s’) \sum_{x’}^{} \eta(x’|x,a,o) U(x’, s’)) ] \end{equation}