controller POMDP policies with FST. Previous papers had exponential blowups.

Successor function is **deterministic**.

## policy iteration

Use FST as policy representation:

- deterministic controller POMDP evaluation
- for all \((a,o,x)\), add a now node x’ and evaluate them to see if its needed
- then, we perform pruning
- everything that’s dominated (i.e. \(U(x,s) < U(x’, s) \forall s\). i.e. we want to prune everything for which the expected utility of being in node \(x’\) dominates the expected utility of \(x\) for all \(x\).
- prune new nodes that are duplicates in terms of action and transitions

When you are done, extract the policy: find the node that maximizes your

## heuristic search

Optimize value function ran starting at the starting belief state, not for all states. Add nodes only when improvement is seen starting at the beginning.

## deterministic controller POMDP evaluation

Recall that controllers are defined over belief-states, and, unlike finite state controller evaluation, the transitions are not distributions; so, we have:

\begin{equation} U(x,s) = R(s,a(x)) + \gamma \sum_{s’}^{}T(s’|s,a(x)) \sum_{o}^{} O(o|s’, a(x)) U(x’(x,a,o), s’) \end{equation}