Ingredients:

- \(\mathcal{P}\) problem (states, transitions, etc.)
- \(d\) depth (how many next states to look into)—more is more accurate but slower
- \(U\) value function estimate at depth \(d\)

We essentially roll forward into all possible next states up to depth \(d\), and tabulate our value function.

Define subroutine `forward_search(depth_remaining, value_function_estimate_at_d, state)`

.

- if
`depth_remaining=0`

; return`(action=None, utility=value_function_estimate_at_d(state))`

- otherwise,
- let
`best = (action = None, utility = -infinity)`

- for each possible action at our state
- get an action-value for our current state where the utility of each next state is the utility given by
`forward_search(depth_remaining-1, value_function_estimate_at_d, next_state)`

- if the action-value is higher than what we have, then we set
`best=(a, action-value)`

- get an action-value for our current state where the utility of each next state is the utility given by
- return
`best`

- let

What this essentially does is to Dijkstra an optimal path towards the highest final utility \(U(s)\) om your current state, by trying all states.