Smith 20??

Last edited: August 8, 2025

HSVI

One-Liner

“impact of approximation decreases as steps from the root node”

Novelty

combined alpha-vector and forward heuristics to guide search of belief states before backup
100x times faster in PBVI
scales to huge environments

Goal: minimize “regret” (difference until optimal policy)

Novelty HSVI 2

Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection)
uses blind lower bound

Notable Methods

Key Figs

New Concepts

Notes

smooth function

Last edited: August 8, 2025

a function is called smoo

Social Learning

Last edited: August 8, 2025

Social Learning is the property of learning with other agents in your environment.

learning from feedback
human ai coordination
social learning
adversarial training

Social Network

Last edited: August 8, 2025

A Social Network is a scheme for studying the relationships and interactions amongst groups of people.

people: \(V\)
relationship: \(E\)
system: a network \(G(V,E)\)

Importantly, the “labels” of \(E\) often do not matter as we frequently want to study only the graphical structure of the Social Network.

degree (node)

The degree of a node is the number of edges that are touching that node (whether in or out, or undirected).

The in-degree and out-degree are the number of edges touching that node (going in or out) respectively.

Social Reinforcement Learning

Last edited: August 8, 2025

Key question: can multi-agent optimization problems help reinforcement learning stuff

using deep RL for combinatorial optimiazation

fast inference scals well with instance size
maybe difficult to actually discover optimal solution: high sample complexity, or failing to find good solutions
doesn’t generalize well

why multi-agent works

decentralized training to improve sample efficiency
adversarial training

Smith 20??

One-Liner

Novelty

Novelty HSVI 2

Notable Methods

Key Figs

New Concepts

Notes

smooth function

Social Learning

Social Network

degree (node)

Social Reinforcement Learning

using deep RL for combinatorial optimiazation

why multi-agent works

New Concepts

Important Results / Claims