Posts

utility function

Last edited: August 8, 2025

quadratic utility

\begin{equation} U(x) = \lambda x - x^{2} \end{equation}

where, \(\lambda>0\) controls risk aversion: as risk increases, utility increases concavely, then eventually utility falls

exponential utility

\begin{equation} U(x) = 1 - e^{-\lambda x} \end{equation}

where \(\lambda >0\) controls risk aversion. This is usually not plausible as utility because people’s utility doesn’t grow exponentially ever

power utility

see power utility

utility fusion

Last edited: August 8, 2025

Take the utility function from a bunch of POMDPs and combine them together using a fusion function.

\begin{equation} U^{*}(b,a) = f(U^{*}(b_1, a) … U^{*}(b_{n}, a) \end{equation}

where \(f\) can be sum or min. The overall belief \(b\) is simply \(B_1 \times … \times B_{n}\), which combines all beliefs together.

utility theory

Last edited: August 8, 2025

utility theory is a set of theories that deals with rational decision making through maximizing the expected utility.

utility theory can be leveraged to choose the right actions in the observe-act cycle in a graphical network via decision networks

additional information

never have a utility function that’s infinite

If something has infinite utility, doing two of the good things is the same as doing one good thing, which is wrong.

Say going to a Taylor concert has \(+\infty\) utility. Then, you would be indifferent to the difference between Taylor + Harry vs. Taylor only. However, the former case clearly has higher utility as long as Harry concert doesn’t have negative utility.

Validation Index

Last edited: August 8, 2025

Key focus: validation of decision making systems that operate over time. See also,

No silver bullet in validation, we must build a safety case.

Logistics

Lectures

value iteration

Last edited: August 8, 2025

We apply the Bellman Expectation Equation and selecting the utility that is calculated by taking the most optimal action given the current utility:

\begin{equation} U_{k+1}(s) = \max_{a} \qty(R(s,a) + \gamma \sum_{s’} T(s’ | s,a) U_{k}(s’)) \end{equation}

This iterative process is called the Bellman backup, or Bellman update.

\begin{equation} U_1 \dots U_{k} \dots U^{*} \end{equation}

eventually will converge into the optimal value function. After which, we just extract the greedy policy from the utility to get a policy to use.