Houjun Liu

Decision Making Index

# index

Lecture notes taking during CS238, decision making. Stanford Intelligence Systems Laboratory (SISL: planning and validation of intelligent systems).

Big Ideas


  1. There’s a principled mathematical framework for defining rational behavior
  2. There are computational techniques that could lead to better, and perhaps counter-intuitive decisions
  3. Successful application depends on your choice of representation and approximation
    • you typically can’t solve mathematical models exactly
    • so, we have to rely on good models of approximations
  4. The same computational approaches can be applied to different application domains
    • the same set of abstractions can be carried through life
    • send Mykel a note about how these topics about where this stuff is applied

These algorithms drive high quality decisions on a tight timeline. You can’t fuck up: people die.


  • Fundamental understanding of mathematical models and solution methods—ungraded book exercises
    • Three quizzes: one question per chapter
      1. chapters 2, 3, 5
  • Implement and extend key algorithms for learning and decision making
  • Identify an application of the theory of this course and formulate it mathematically (proposal)
    • what are the i/o
    • what are the sensors measurements
    • what are the decisions to be made
  • [one other thing]

Course Outline

1-shot: Probabilistic Reasoning

  • models of distributions over many variables
  • using distributions to make inferences
  • utility theory

n-shot: Sequential Problems

Model Uncertainty

  • deal with situations where we don’t know what the best action is at any given step
  • i.e.: future rewards, etc.
  • introduce reinforcement learning and its challenges
    1. Rewards may be received long after important decisions
    2. Agents must generalized from limited exploration experience

State Uncertainty

  • deal with situations where we don’t know what is actually happening: we only have a probabilistic state
  • introduce Partially Observable Markov Decision Process
    1. keep a distribution of believes
    2. update the distribution of believes
    3. make decisions based the distribution

Multiagent Systems


probabilistic reasoning relating to single decisions

Baysian Networks, and how to deal with them.

a chain of reasoning with feedback

Markov Decision Process uses policies that are evaluated with policy evaluation via utility, Bellman Equation, value function, etc.

If we know the state space fully, we can use policy iteration and value iteration to determine an objectively optimal policy. If we don’t (or if the state space is too large), we can try to discretize our state space and appropriate through Approximate Value Functions, or use online planning approaches to compute good policy as we go.

If none of those things are feasible (i.e. your state space is too big or complex to be discretized (i.e. sampling will cause you to loose the structure of the problem)), you can do some lovely Policy Optimization which will keep you in continuous space while iterating on the policy directly. Some nerds lmao like Policy Gradient methods if your policy is differentiable.

Now, Policy Optimization methods all require sampling a certain set of trajectories and optimizing over them in order to work. How do we know how much sampling to do before we start optimizing? That’s an Exploration and Exploitation question. We can try really hard to collect trajectories, but then we’d loose out on collecting intermediate reward.

POMDP bomp bomp bomp


  • Change the action space
  • Change the reward function
  • Change the transition function
  • Improve the solver
  • Don’t worry about it
  • Don’t deploy the system

Words of Wisdom from Mykel

“The belief update is central to learning. The point of education is to change your beliefs; look for opportunities to change your belief.”

“What’s in the action space, how do we maximize it?”

From MDPs, “we can learn from the past, but the past doesn’t influence you.”

“Optimism under uncertainty”: Exploration and Exploitation “you should try things”