If we are tele-operating a robot, we ideally want to minimize cost. We want to estimate a user’s goal via user inputs. Predict the most likely goal + assist for it.
“find a cost function for which user input \(u\) is optimal”.
- system does not know the goal
- the user may not change their goal on a whim
Hindsight Optimization
To solve this, we use QMDP: “select the most optimal actions to estimating cost-to-go assuming full observability”.
\begin{equation} Q(b,a,u) = \sum_{g}^{} b(g) Q_{g}(x,a,u) \end{equation}
Result
users felt less in control with Hindsight Optimization, despite reaching the goal faster with this policy.
Challenging the results between “task completion” vs. “user satisfaction”.