utility theory
Last edited: August 8, 2025utility theory is a set of theories that deals with rational decision making through maximizing the expected utility.
utility theory can be leveraged to choose the right actions in the observe-act cycle in a graphical network via decision networks
additional information
never have a utility function that’s infinite
If something has infinite utility, doing two of the good things is the same as doing one good thing, which is wrong.
Say going to a Taylor concert has \(+\infty\) utility. Then, you would be indifferent to the difference between Taylor + Harry vs. Taylor only. However, the former case clearly has higher utility as long as Harry concert doesn’t have negative utility.
Validation Index
Last edited: August 8, 2025Key focus: validation of decision making systems that operate over time. See also,
No silver bullet in validation, we must build a safety case.
Logistics
Lectures
value iteration
Last edited: August 8, 2025We apply the Bellman Expectation Equation and selecting the utility that is calculated by taking the most optimal action given the current utility:
\begin{equation} U_{k+1}(s) = \max_{a} \qty(R(s,a) + \gamma \sum_{s’} T(s’ | s,a) U_{k}(s’)) \end{equation}
This iterative process is called the Bellman backup, or Bellman update.
\begin{equation} U_1 \dots U_{k} \dots U^{*} \end{equation}
eventually will converge into the optimal value function. After which, we just extract the greedy policy from the utility to get a policy to use.
value iteration, in practice
Last edited: August 8, 2025Say we have a system:

- States: 4—school, internship, job, jungle
- Actions: 2—stay, graduate
create transition model
Create tables of size \(S \times S\) (that is, 4x4), one for each action. These are our transition models. Rows are the states where we took the action, columns are the states which are the results of the action, and the values are the probability of that transition happening given you took the action.
value of information
Last edited: August 8, 2025VOI is a measure of how much observing something changes your action if you are a rational agent.
The value of information a measure for how much observing an additional variable is expected to increase our utility. VOI can never be negative, and does not take into account the COST of performing the observation.
constituents
- \(o\): an observation
- \(O’\): a possible observation to run which yield \(o’_{j}\) different outcomes
requirements
\begin{equation} VOI(O’|o) = (\sum_{o’} P(o’|o) EU^{*}(o, o’)) - EU^{*}(o) \end{equation}
