Posts

SU-CS161 NOV112025

Last edited: November 11, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

SU-CS161 NOV132025

Last edited: November 11, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

SU-CS229 NOV122025

Last edited: November 11, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

continuous state MDP

Last edited: November 11, 2025

Bellman Equation, etc., are really designed for state spaces that are discrete. However, we’d really like to be able to support continuous state spaces! Suppose we have: \(S \in \mathbb{R}^{n}\), what can we do?

Discretization

We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your \(V\) back to a step function. Recall that this could start exploding: for \(S \in \mathbb{R}^{n}\) and we want to divide each axes into \(k\) values, we will get \(k^{n}\) values!

SU-CS229 NOV102025

Last edited: November 11, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

  • “sometimes we may want to model slower than the data to be collected; for instance, your helicopter really doesn’t move anywhere every 100ths of a second to be learned, but you can collect data that fast”

Debugging RL

RL should work when

  1. The simulator is good
  2. The RL algorithm correctly maximize \(V^{\pi}\)
  3. Reward such that maximum expected payoff corresponds to your goal

Diagnostics

  • check your simulator: if your policy works in sim but not IRL, your sim is bad
  • if \(V^{\text{RL}} < V^{\text{human}}\), then your RL algorithm is just bad
  • if \(V^{\text{RL}} \geq V^{\text{human}}\), then your objective function is bad