utility function
Last edited: August 8, 2025quadratic utility
\begin{equation} U(x) = \lambda x - x^{2} \end{equation}
where, \(\lambda>0\) controls risk aversion: as risk increases, utility increases concavely, then eventually utility falls
exponential utility
\begin{equation} U(x) = 1 - e^{-\lambda x} \end{equation}
where \(\lambda >0\) controls risk aversion. This is usually not plausible as utility because people’s utility doesn’t grow exponentially ever
power utility
see power utility
utility fusion
Last edited: August 8, 2025Take the utility function from a bunch of POMDPs and combine them together using a fusion function.
\begin{equation} U^{*}(b,a) = f(U^{*}(b_1, a) … U^{*}(b_{n}, a) \end{equation}
where \(f\) can be sum
or min
. The overall belief \(b\) is simply \(B_1 \times … \times B_{n}\), which combines all beliefs together.
utility theory
Last edited: August 8, 2025utility theory is a set of theories that deals with rational decision making through maximizing the expected utility.
utility theory can be leveraged to choose the right actions in the observe-act cycle in a graphical network via decision networks
additional information
never have a utility function that’s infinite
If something has infinite utility, doing two of the good things is the same as doing one good thing, which is wrong.
Say going to a Taylor concert has \(+\infty\) utility. Then, you would be indifferent to the difference between Taylor + Harry vs. Taylor only. However, the former case clearly has higher utility as long as Harry concert doesn’t have negative utility.
Validation Index
Last edited: August 8, 2025Key focus: validation of decision making systems that operate over time. See also,
No silver bullet in validation, we must build a safety case.
Logistics
Lectures
value iteration
Last edited: August 8, 2025We apply the Bellman Expectation Equation and selecting the utility that is calculated by taking the most optimal action given the current utility:
\begin{equation} U_{k+1}(s) = \max_{a} \qty(R(s,a) + \gamma \sum_{s’} T(s’ | s,a) U_{k}(s’)) \end{equation}
This iterative process is called the Bellman backup, or Bellman update.
\begin{equation} U_1 \dots U_{k} \dots U^{*} \end{equation}
eventually will converge into the optimal value function. After which, we just extract the greedy policy from the utility to get a policy to use.