FV-POMCPs
Last edited: August 8, 2025Main problem: joint actions and observations are exponential by the number of agents.
Solution: Smaple-based online planning for multiagent systems. We do this with the factored-value POMCP.
- factored statistics: reduces the number of joint actions (through action selection statistics)
- factored trees: reduces the number of histories
Multiagent Definition
- \(I\) set of agents
- \(S\) set of states
- \(A_{i}\) set of states for each agent \(i\)
- \(T\) state transitions
- \(R\) reward function
- \(Z_{i}\) joint observations for each agents
- \(O\) set of observations
Coordination Graphs
you can use sum-product elimination to shorten the Baysian Network of the agent Coordination Graphs (which is how agents influnece each other).
G-DICE
Last edited: August 8, 2025Motivation
Its the same. It hasn’t changed: curses of dimensionality and history.
Goal: to solve decentralized multi-agent MDPs.
Key Insights
- macro-actions (MAs) to reduce computational complexity (like hierarchical planning)
- uses cross entropy to make infinite horizon problem tractable
Prior Approaches
- masked Monte Carlo search: heuristic based, no optimality garantees
- MCTS: poor performance
Direct Cross Entropy
see also Cross Entropy Method
- sample a value function \(k\)
- takes \(n\) highest sampled values
- update parameter \(\theta\)
- resample until distribution convergence
- take the best sample \(x\)
G-DICE
- create a graph with exogenous \(N\) nodes, and \(O\) outgoing edges (designed before)
- use Direct Cross Entropy to solve for the best policy

Results
- demonstrates improved performance over MMCS and MCTS
- does not need robot communication
- garantees convergence for both finite and infiinte horizon
- can choose exogenous number of nodes in order to gain computational savings
Galactica
Last edited: August 8, 2025Galactica is a large-languange model for generating research papers, made by meta research
Galton Board
Last edited: August 8, 2025
One of these things. It is actually a binomial distribution.
You can phrase the probability at
GAMMA
Last edited: August 8, 2025Past Work
- self play: this is a \(\text{coNP}\) vs \(\text{NP}\) problem: whereas competitive self-play attempts to defend against all strategies, collaborative self-play only needs to find one useful strategy; this doesn’t generalize well because humans are not a partner
- behavior cloning:
- Population Based Training: computational super e
Novelty
- instead, learn a generative model from both simulated agents or human data
- then, sample from this generative model
