Validation Index

Last edited: August 8, 2025

Key focus: validation of decision making systems that operate over time. See also,

No silver bullet in validation, we must build a safety case.

Logistics

Lectures

SU-CS238V MAR042025

value iteration

Last edited: August 8, 2025

We apply the Bellman Expectation Equation and selecting the utility that is calculated by taking the most optimal action given the current utility:

\begin{equation} U_{k+1}(s) = \max_{a} \qty(R(s,a) + \gamma \sum_{s’} T(s’ | s,a) U_{k}(s’)) \end{equation}

This iterative process is called the Bellman backup, or Bellman update.

\begin{equation} U_1 \dots U_{k} \dots U^{*} \end{equation}

eventually will converge into the optimal value function. After which, we just extract the greedy policy from the utility to get a policy to use.

value iteration, in practice

Last edited: August 8, 2025

Say we have a system:

States: 4—school, internship, job, jungle
Actions: 2—stay, graduate

create transition model

Create tables of size \(S \times S\) (that is, 4x4), one for each action. These are our transition models. Rows are the states where we took the action, columns are the states which are the results of the action, and the values are the probability of that transition happening given you took the action.

value of information

Last edited: August 8, 2025

VOI is a measure of how much observing something changes your action if you are a rational agent.

The value of information a measure for how much observing an additional variable is expected to increase our utility. VOI can never be negative, and does not take into account the COST of performing the observation.

constituents

\(o\): an observation
\(O’\): a possible observation to run which yield \(o’_{j}\) different outcomes

requirements

\begin{equation} VOI(O’|o) = (\sum_{o’} P(o’|o) EU^{*}(o, o’)) - EU^{*}(o) \end{equation}

variance

Last edited: August 8, 2025

variance (also known as second central moment) is a way of measuring spread:

\begin{align} Var(X) &= E[(X-E(X))^{2}] \\ &= E[X^{2}] - (E[X])^{2} \\ &= \qty(\sum_{x}^{} x^{2} p\qty(X=x)) - (E[X])^{2} \end{align}

“on average, how far is the probability of \(X\) from its expectation”

The expression(s) are derived below. Recall that standard deviation is a square root of the variance.

computing variance:

\begin{align} Var(X) &= E[(X - \mu)^{2}] \\ &= \sum_{x}^{} (x-\mu)^{2} p(X) \end{align}

based on the law of the Unconscious statistician. And then, we do algebra: