Key Sequence
Notation
New Concepts
- Markov Decision Process
- Bellman Residual
- for continuous state spaces: Approximate Value Function
- use global approximation or local approximation methods
Important Results / Claims
- policy and utility
- creating a good utility function / policy from instantaneous rewards: either policy evaluation or value iteration
- creating a policy from a utility function: value-function policy (“choose the policy that takes the best valued action”)
- calculating the utility function a policy currently uses: use policy evaluation
- kernel smoothing
- value iteration, in practice