_index.org

Factored MDPs

Last edited: August 8, 2025

Motivation

Multiple agents need to collaborate to achieve common goal.

Joint Utility Maximization: maximize the joint utility between various agents.

Possible Approaches

  • Using a traditional MDP: an MDP considers “action” as a joint action between all agents (exponential blow up because the agent actions multiply)
  • Local Optimization: share rewards/values among agents
  • Local Optimization: search and maximize joint utility explicitly (no need to model the entire action space)

Problems with single Reward Sharing:

Failure Distribution

Last edited: August 8, 2025

For a trajectory \(p\qty(\tau)\), the failure distribution is $p \qty(τ | τ ¬ ∈ψ)$—the probability of a particular trajectory given that its a failure:

\begin{equation} p \qty( \tau \mid \tau \not \in \psi) = \frac{\mathbb{1}\qty {\tau \not \in \psi} p\qty(\tau)}{ \int \mathbb{1}\qty {\tau \not \in \psi} p\qty(\tau) \dd{\tau}} \end{equation}

This bottom integral could be very difficult to compute; but the numerator may take a bit more work to compute!


So ultimately we can also give up and don’t normalize (and then use systems that allows us to draw samples from unnormalized probability densities:

failure mode characterization

Last edited: August 8, 2025

take a bunch of failure trajectories, and cluster them; can possibly do it with STL systems

fairness

Last edited: August 8, 2025

fairness through unawareness

procedural fairness, or fairness through unawareness is a fairness system

If you have no idea about the demographics of protected groups, you will make better decisions.

  1. exclude sensitive features from datasets
  2. exclude proxies of protected groups

Problem: deeply correlated information (such as stuff that people like) is hard to get rid of—individual features does nothing with respect to predicting gender, but taken in groups it can recover protected group information.

fairness through awareness

we only care about the outcome

falsification

Last edited: August 8, 2025

falsification is the process of systematically finding failures of a particular system to inform future design decisions.

Goals:

  • enhance system sensors
  • change the agent’s policy
  • revise the system requirements
  • adapt the training of human operators
  • recognize a system as having limitations
  • …or abandon the project

Here are some methods

direct falsification

  • rollout to a ceratin depth
  • check if any trajectory is a failure and collect
  • return them

drawback: this doesn’t work super well for rare events