_index.org

FV-POMCPs

Last edited: August 8, 2025

Main problem: joint actions and observations are exponential by the number of agents.

Solution: Smaple-based online planning for multiagent systems. We do this with the factored-value POMCP.

  • factored statistics: reduces the number of joint actions (through action selection statistics)
  • factored trees: reduces the number of histories

Multiagent Definition

  • \(I\) set of agents
  • \(S\) set of states
  • \(A_{i}\) set of states for each agent \(i\)
  • \(T\) state transitions
  • \(R\) reward function
  • \(Z_{i}\) joint observations for each agents
  • \(O\) set of observations

Coordination Graphs

you can use sum-product elimination to shorten the Baysian Network of the agent Coordination Graphs (which is how agents influnece each other).

G-DICE

Last edited: August 8, 2025

Motivation

Its the same. It hasn’t changed: curses of dimensionality and history.

Goal: to solve decentralized multi-agent MDPs.

Key Insights

  1. macro-actions (MAs) to reduce computational complexity (like hierarchical planning)
  2. uses cross entropy to make infinite horizon problem tractable

Prior Approaches

  • masked Monte Carlo search: heuristic based, no optimality garantees
  • MCTS: poor performance

Direct Cross Entropy

see also Cross Entropy Method

  1. sample a value function \(k\)
  2. takes \(n\) highest sampled values
  3. update parameter \(\theta\)
  4. resample until distribution convergence
  5. take the best sample \(x\)

G-DICE

  1. create a graph with exogenous \(N\) nodes, and \(O\) outgoing edges (designed before)
  2. use Direct Cross Entropy to solve for the best policy

Results

  1. demonstrates improved performance over MMCS and MCTS
  2. does not need robot communication
  3. garantees convergence for both finite and infiinte horizon
  4. can choose exogenous number of nodes in order to gain computational savings

Galactica

Last edited: August 8, 2025

Galactica is a large-languange model for generating research papers, made by meta research

Galton Board

Last edited: August 8, 2025

One of these things. It is actually a binomial distribution.

You can phrase the probability at

GAMMA

Last edited: August 8, 2025

Past Work

  • self play: this is a \(\text{coNP}\) vs \(\text{NP}\) problem: whereas competitive self-play attempts to defend against all strategies, collaborative self-play only needs to find one useful strategy; this doesn’t generalize well because humans are not a partner
  • behavior cloning:
  • Population Based Training: computational super e

Novelty

  • instead, learn a generative model from both simulated agents or human data
  • then, sample from this generative model

Notable Methods

Key Figs

New Concepts

Notes