continuous state MDP

Bellman Equation, etc., are really designed for state spaces that are discrete. However, we’d really like to be able to support continuous state spaces! Suppose we have: \(S \in \mathbb{R}^{n}\), what can we do?

Discretization

We can just pretend that our system is a discrete-state MDP by chopping the state space up into small blocks. If you do it, you can cast your \(V\) back to a step function. Recall that this could start exploding: for \(S \in \mathbb{R}^{n}\) and we want to divide each axes into \(k\) values, we will get \(k^{n}\) values!

Also, instead of using the same size grid for every state variable, we may use more steps on the states for which output sensitivity is higher.

Using a Function

Use a traditional function approximation (linear regression, neural network, etc.) as a proxy.

To do this, we need a model \(f\) of the MDP such that we have \(s’ = f\qty(s,a)\) such that \(s’ \sim T\qty(.|s,a)\). You may also have stochastic models, namly we can add something like \(s’ = f\qty(s,a) + \varepsilon\) where \(\epsilon \sim \mathcal{N}\) to make a more robust model. You can obtain \(f\) via data or via physics / expert design.

After we have this, given a state \(s\), call \(\phi\qty(s)\) the features of state \(s\). Then, we write:

\begin{equation} V\qty(s) = \hat{\t}^{T} \phi\qty(s) \end{equation}

Recall also, if we determinize our MDP, we have the Bellman equation as:

\begin{equation} V\qty(s) = R\qty(s) + \gamma \max_{a} V\qty(s’), \text{ where } s’ = T\qty(s,a) \end{equation}

Now we have all the pieces to perform a particular type of value iteration:

Fitted Value Iteration

Sample \(s_1, …, s_{n} \in S\). Initialize parameters \(\theta\) to seed a model \(V_{\theta}\qty(s) = \theta^{T} s\).

Repeat, for each \(i = 1 … n\)

  • compute: \(y^{(i)} = R\qty(s^{(i)}) + \gamma \max_{a} V\qty(T\qty(s,a))\)
  • update your model \(V_{\theta}\) as usual

If your value has stochasticity, we can just run it 10 times, etc. and get a next state approximation in a monte-carlo way.