Houjun Liu

HybPlan

“Can we come up a policy that, if not fast, at least reach the goal!

Background

Stochastic Shortest-Path

we are at an initial state, and we have a series of goal states, and we want to reach to the goal states.

We can solve this just by:

Problem

MDP + Goal States

  • \(S\): set of states
  • \(A\): actions
  • \(P(s’|s,a)\): transition
  • \(C\): reward
  • \(G\): absorbing goal states

Approach

Combining LRTDP with anytime dynamics

  1. run GPT (not the transformer, “General Planning Tool”, think LRTDP) exact solver
  2. use GPT policy for solved states or visited more than a certain threshold
  3. uses MBP policy for other states
  4. policy evaluation for convergence

“use GPT solution as much as possible, and when we haven’t ever visited a place due to the search trajectories, we can use MBP to supplement the solution”