SU-CS238 OCT192023
Last edited: August 8, 2025Key Sequence
Notation
New Concepts
- Markov Decision Process
- Bellman Residual
- for continuous state spaces: Approximate Value Function
- use global approximation or local approximation methods
Important Results / Claims
- policy and utility
- creating a good utility function / policy from instantaneous rewards: either policy evaluation or value iteration
- creating a policy from a utility function: value-function policy (“choose the policy that takes the best valued action”)
- calculating the utility function a policy currently uses: use policy evaluation
- kernel smoothing
- value iteration, in practice
Questions
Interesting Factoids
SU-CS238 OCT242023
Last edited: August 8, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
SU-CS238 OCT262023
Last edited: August 8, 2025Big day. Policy Gradient.
New Concepts
Important Results / Claims
- monte-carlo policy evaluation
- Finite-Difference Gradient Estimation
- Linear Regression Gradient Estimate