## Motivation

Multiple agents need to collaborate to achieve common goal.

**Joint Utility Maximization**: maximize the joint utility between various agents.

## Possible Approaches

**Using a traditional MDP**: an MDP considers “action” as a joint action between all agents (exponential blow up because the agent actions multiply)**Local Optimization**: share rewards/values among agents**Local Optimization**: search and maximize joint utility explicitly (no need to model the entire action space)

Problems with single Reward Sharing:

### Credit Assignment Problem

In collective reward situations, determining which action out of the cohort actually contributed to the award is hard.

### Free Ride Problem

Agents can benefit from reward without actually doing anything by being carried.

## Factored MDPs Representation

- Using factored linear value function to approximate the joint value function
- Using linear programming to avoid exponential blow up

### Background

#### Coordination Graphs

- modeling each agent as a node
- each edge is a dependency

#### factored Markov Decision Process

- MDPs are not good at large problems
- factor the state and action spaces as a random variable factors, etc.

### action selection

- each agent maintains a local \(Q\) function indicating its population
- the \(Q\) function of each agent maybe influenced by other agents:
- the coordination graph of the agent is used to calculate contribution

We optimize by using **one agent at a time**: we optimize one agent, then