Houjun Liu

Factored MDPs

Motivation

Multiple agents need to collaborate to achieve common goal.

Joint Utility Maximization: maximize the joint utility between various agents.

Possible Approaches

Using a traditional MDP: an MDP considers “action” as a joint action between all agents (exponential blow up because the agent actions multiply)
Local Optimization: share rewards/values among agents
Local Optimization: search and maximize joint utility explicitly (no need to model the entire action space)

Problems with single Reward Sharing:

Credit Assignment Problem

In collective reward situations, determining which action out of the cohort actually contributed to the award is hard.

Free Ride Problem

Agents can benefit from reward without actually doing anything by being carried.

Factored MDPs Representation

Using factored linear value function to approximate the joint value function
Using linear programming to avoid exponential blow up

Background

Coordination Graphs

modeling each agent as a node
each edge is a dependency

factored Markov Decision Process

MDPs are not good at large problems
factor the state and action spaces as a random variable factors, etc.

action selection

each agent maintains a local \(Q\) function indicating its population
the \(Q\) function of each agent maybe influenced by other agents:
- the coordination graph of the agent is used to calculate contribution

We optimize by using one agent at a time: we optimize one agent, then