## Motivation

- Imperfect sensors in robot control:
**partial observations** - Manipulators face tradeoff between
**sensing**+**acting**

curse of dimensionality and curse of history.

## Belief-Space Planning

Perhaps we should plan over all possible distributions of state space, making a belief-state MDP.

But: this is a **nonlinear**, **stochastic** dynamic. In fact: there maybe stochastic events that affects dynamics.

Big problem:

## Belief iLQR

“determinize and replan”: simplify the dynamics at each step, plan, take action, and replan

- tracks belief via observations
- simplifies belief state dynamics based on linear MLE

When the dynamics is linear, you can use Linear-Quadratic Regulator to solve. This results in a worse policy but will give you a policy.

### Previous Work

- “just solve most-likely state”: doesn’t take action to explore and understand the state.
- “belief roadmap”: not really planning in the belief space itself

### Approach

#### Belief Update

We use Baysian updates for the state probably updates:

\begin{equation} P(s_{t+1}) = \eta P(o_{t+1}|s_{t+1}) \int_{x} p(_{t+1}|x, a_{t}) P(s) \end{equation}

and then the actual beliefs are updated with Extended Kalman Filter.

Importantly, the Extended Kalman Filter usually requires us to take an expectation of each observation O over all O; instead, we assume that the future states are uniform linearly distributed.

#### Belief Update Cost

Ideally, we want to lower covariance of the belief vectors in order to be more confident.

- first term: reduce large trajectories (verify)
- second: stabilization

#### Replanning Strategy

```
while b not at goal:
# replan at where we are at now
(b, a, mean_b) = create_initial_plan(b);
for depth d:
a_t = solve_lqr_for_plan_at_time(b, a, mean_b)
o = environment.step(a_t)
b = extended_kalman(b, a, o)
if mean(b) > max_allowed_belief_uncertainty:
break
```