Houjun Liu

G-DICE

Motivation

Its the same. It hasn’t changed: curses of dimensionality and history.

Goal: to solve decentralized multi-agent MDPs.

Key Insights

  1. macro-actions (MAs) to reduce computational complexity (like hierarchical planning)
  2. uses cross entropy to make infinite horizon problem tractable

Prior Approaches

  • masked Monte Carlo search: heuristic based, no optimality garantees
  • MCTS: poor performance

Direct Cross Entropy

see also Cross Entropy Method

  1. sample a value function \(k\)
  2. takes \(n\) highest sampled values
  3. update parameter \(\theta\)
  4. resample until distribution convergence
  5. take the best sample \(x\)

G-DICE

  1. create a graph with exogenous \(N\) nodes, and \(O\) outgoing edges (designed before)
  2. use Direct Cross Entropy to solve for the best policy

Results

  1. demonstrates improved performance over MMCS and MCTS
  2. does not need robot communication
  3. garantees convergence for both finite and infiinte horizon
  4. can choose exogenous number of nodes in order to gain computational savings