Hindsight Optimization

If we are tele-operating a robot, we ideally want to minimize cost. We want to estimate a user’s goal via user inputs. Predict the most likely goal + assist for it.

“find a cost function for which user input \(u\) is optimal”.

system does not know the goal
the user may not change their goal on a whim

Hindsight Optimization

To solve this, we use QMDP: “select the most optimal actions to estimating cost-to-go assuming full observability”.

\begin{equation} Q(b,a,u) = \sum_{g}^{} b(g) Q_{g}(x,a,u) \end{equation}

Result

users felt less in control with Hindsight Optimization, despite reaching the goal faster with this policy.

Challenging the results between “task completion” vs. “user satisfaction”.