SU-CS161 NOV112025
Last edited: November 11, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
SU-CS161 NOV132025
Last edited: November 11, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
SU-CS229 NOV122025
Last edited: November 11, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
continuous state MDP
Last edited: November 11, 2025Bellman Equation, etc., are really designed for state spaces that are discrete. However, we’d really like to be able to support continuous state spaces! Suppose we have: \(S \in \mathbb{R}^{n}\), what can we do?
Discretization
We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your \(V\) back to a step function. Recall that this could start exploding: for \(S \in \mathbb{R}^{n}\) and we want to divide each axes into \(k\) values, we will get \(k^{n}\) values!
SU-CS229 NOV102025
Last edited: November 11, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
- “sometimes we may want to model slower than the data to be collected; for instance, your helicopter really doesn’t move anywhere every 100ths of a second to be learned, but you can collect data that fast”
Debugging RL
RL should work when
- The simulator is good
- The RL algorithm correctly maximize \(V^{\pi}\)
- Reward such that maximum expected payoff corresponds to your goal
Diagnostics
- check your simulator: if your policy works in sim but not IRL, your sim is bad
- if \(V^{\text{RL}} < V^{\text{human}}\), then your RL algorithm is just bad
- if \(V^{\text{RL}} \geq V^{\text{human}}\), then your objective function is bad
