## Motivation

Its the same. It hasn’t changed: curses of dimensionality and history.

Goal: to solve decentralized multi-agent MDPs.

## Key Insights

- macro-actions (MAs) to reduce computational complexity (like hierarchical planning)
- uses cross entropy to make infinite horizon problem tractable

## Prior Approaches

**masked Monte Carlo search**: heuristic based, no optimality garantees- MCTS: poor performance

## Direct Cross Entropy

see also Cross Entropy Method

- sample a value function \(k\)
- takes \(n\) highest sampled values
- update parameter \(\theta\)
- resample until distribution convergence
- take the best sample \(x\)

## G-DICE

- create a graph with exogenous \(N\) nodes, and \(O\) outgoing edges (designed before)
- use Direct Cross Entropy to solve for the best policy

## Results

- demonstrates improved performance over MMCS and MCTS
- does not need robot communication
- garantees convergence for both finite and infiinte horizon
- can choose exogenous number of nodes in order to gain computational savings