Posts

G-DICE

Last edited: August 8, 2025

Motivation

Its the same. It hasn’t changed: curses of dimensionality and history.

Goal: to solve decentralized multi-agent MDPs.

Key Insights

  1. macro-actions (MAs) to reduce computational complexity (like hierarchical planning)
  2. uses cross entropy to make infinite horizon problem tractable

Prior Approaches

  • masked Monte Carlo search: heuristic based, no optimality garantees
  • MCTS: poor performance

Direct Cross Entropy

see also Cross Entropy Method

  1. sample a value function \(k\)
  2. takes \(n\) highest sampled values
  3. update parameter \(\theta\)
  4. resample until distribution convergence
  5. take the best sample \(x\)

G-DICE

  1. create a graph with exogenous \(N\) nodes, and \(O\) outgoing edges (designed before)
  2. use Direct Cross Entropy to solve for the best policy

Results

  1. demonstrates improved performance over MMCS and MCTS
  2. does not need robot communication
  3. garantees convergence for both finite and infiinte horizon
  4. can choose exogenous number of nodes in order to gain computational savings

Galactica

Last edited: August 8, 2025

Galactica is a large-languange model for generating research papers, made by meta research

Galton Board

Last edited: August 8, 2025

One of these things. It is actually a binomial distribution.

You can phrase the probability at

GAMMA

Last edited: August 8, 2025

Past Work

  • self play: this is a \(\text{coNP}\) vs \(\text{NP}\) problem: whereas competitive self-play attempts to defend against all strategies, collaborative self-play only needs to find one useful strategy; this doesn’t generalize well because humans are not a partner
  • behavior cloning:
  • Population Based Training: computational super e

Novelty

  • instead, learn a generative model from both simulated agents or human data
  • then, sample from this generative model

Notable Methods

Key Figs

New Concepts

Notes

GARCH

Last edited: August 8, 2025

The GARCH model is a model for the heteroskedastic variations where the changes in variance is assumed to be auto correlated: that, though the variance changes, it changes in a predictable manner.

It is especially useful to

GARCH 1,1

Conditional mean:

\begin{equation} y_{t} = x’_{t} \theta + \epsilon_{t} \end{equation}

Then, the epsilon parameter:

\begin{equation} \epsilon_{t} = \sigma_{t}z_{t} \end{equation}

where:

\begin{equation} z_{t} \sim \mathcal{N}(0,1) \end{equation}

and:

conditional variance

\begin{equation} {\sigma_{t}}^{2} = \omega + \lambda {\sigma_{t-1}}^{2} + \beta {\sigma_{t-1}}^{2} \end{equation}