Posts

utility theory

Last edited: August 8, 2025

utility theory is a set of theories that deals with rational decision making through maximizing the expected utility.

utility theory can be leveraged to choose the right actions in the observe-act cycle in a graphical network via decision networks

additional information

never have a utility function that’s infinite

If something has infinite utility, doing two of the good things is the same as doing one good thing, which is wrong.

Say going to a Taylor concert has \(+\infty\) utility. Then, you would be indifferent to the difference between Taylor + Harry vs. Taylor only. However, the former case clearly has higher utility as long as Harry concert doesn’t have negative utility.

Validation Index

Last edited: August 8, 2025

Key focus: validation of decision making systems that operate over time. See also,

No silver bullet in validation, we must build a safety case.

Logistics

Lectures

value iteration

Last edited: August 8, 2025

We apply the Bellman Expectation Equation and selecting the utility that is calculated by taking the most optimal action given the current utility:

\begin{equation} U_{k+1}(s) = \max_{a} \qty(R(s,a) + \gamma \sum_{s’} T(s’ | s,a) U_{k}(s’)) \end{equation}

This iterative process is called the Bellman backup, or Bellman update.

\begin{equation} U_1 \dots U_{k} \dots U^{*} \end{equation}

eventually will converge into the optimal value function. After which, we just extract the greedy policy from the utility to get a policy to use.

value iteration, in practice

Last edited: August 8, 2025

Say we have a system:

  1. States: 4—school, internship, job, jungle
  2. Actions: 2—stay, graduate

create transition model

Create tables of size \(S \times S\) (that is, 4x4), one for each action. These are our transition models. Rows are the states where we took the action, columns are the states which are the results of the action, and the values are the probability of that transition happening given you took the action.

value of information

Last edited: August 8, 2025

VOI is a measure of how much observing something changes your action if you are a rational agent.

The value of information a measure for how much observing an additional variable is expected to increase our utility. VOI can never be negative, and does not take into account the COST of performing the observation.

constituents

  • \(o\): an observation
  • \(O’\): a possible observation to run which yield \(o’_{j}\) different outcomes

requirements

\begin{equation} VOI(O’|o) = (\sum_{o’} P(o’|o) EU^{*}(o, o’)) - EU^{*}(o) \end{equation}