G-DICE
Last edited: August 8, 2025Motivation
Its the same. It hasn’t changed: curses of dimensionality and history.
Goal: to solve decentralized multi-agent MDPs.
Key Insights
- macro-actions (MAs) to reduce computational complexity (like hierarchical planning)
- uses cross entropy to make infinite horizon problem tractable
Prior Approaches
- masked Monte Carlo search: heuristic based, no optimality garantees
- MCTS: poor performance
Direct Cross Entropy
see also Cross Entropy Method
- sample a value function \(k\)
- takes \(n\) highest sampled values
- update parameter \(\theta\)
- resample until distribution convergence
- take the best sample \(x\)
G-DICE
- create a graph with exogenous \(N\) nodes, and \(O\) outgoing edges (designed before)
- use Direct Cross Entropy to solve for the best policy

Results
- demonstrates improved performance over MMCS and MCTS
- does not need robot communication
- garantees convergence for both finite and infiinte horizon
- can choose exogenous number of nodes in order to gain computational savings
Galactica
Last edited: August 8, 2025Galactica is a large-languange model for generating research papers, made by meta research
Galton Board
Last edited: August 8, 2025
One of these things. It is actually a binomial distribution.
You can phrase the probability at
GAMMA
Last edited: August 8, 2025Past Work
- self play: this is a \(\text{coNP}\) vs \(\text{NP}\) problem: whereas competitive self-play attempts to defend against all strategies, collaborative self-play only needs to find one useful strategy; this doesn’t generalize well because humans are not a partner
- behavior cloning:
- Population Based Training: computational super e
Novelty
- instead, learn a generative model from both simulated agents or human data
- then, sample from this generative model
Notable Methods
Key Figs
New Concepts
Notes
GARCH
Last edited: August 8, 2025The GARCH model is a model for the heteroskedastic variations where the changes in variance is assumed to be auto correlated: that, though the variance changes, it changes in a predictable manner.
It is especially useful to
GARCH 1,1
Conditional mean:
\begin{equation} y_{t} = x’_{t} \theta + \epsilon_{t} \end{equation}
Then, the epsilon parameter:
\begin{equation} \epsilon_{t} = \sigma_{t}z_{t} \end{equation}
where:
\begin{equation} z_{t} \sim \mathcal{N}(0,1) \end{equation}
and:
conditional variance
\begin{equation} {\sigma_{t}}^{2} = \omega + \lambda {\sigma_{t-1}}^{2} + \beta {\sigma_{t-1}}^{2} \end{equation}