Smith 20??
Last edited: August 8, 2025One-Liner
“impact of approximation decreases as steps from the root node”
Novelty
- combined alpha-vector and forward heuristics to guide search of belief states before backup
- 100x times faster in PBVI
- scales to huge environments
Goal: minimize “regret” (difference until optimal policy)
Novelty HSVI 2
- Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection)
- uses blind lower bound
Notable Methods
Key Figs
New Concepts
Notes
smooth function
Last edited: August 8, 2025a function is called smoo
Social Learning
Last edited: August 8, 2025Social Learning is the property of learning with other agents in your environment.
- learning from feedback
- human ai coordination
- social learning
- adversarial training
Social Network
Last edited: August 8, 2025A Social Network is a scheme for studying the relationships and interactions amongst groups of people.
- people: \(V\)
- relationship: \(E\)
- system: a network \(G(V,E)\)
Importantly, the “labels” of \(E\) often do not matter as we frequently want to study only the graphical structure of the Social Network.
degree (node)
The degree of a node is the number of edges that are touching that node (whether in or out, or undirected).
The in-degree and out-degree are the number of edges touching that node (going in or out) respectively.
Social Reinforcement Learning
Last edited: August 8, 2025Key question: can multi-agent optimization problems help reinforcement learning stuff
using deep RL for combinatorial optimiazation
- fast inference scals well with instance size
- maybe difficult to actually discover optimal solution: high sample complexity, or failing to find good solutions
- doesn’t generalize well
why multi-agent works
- decentralized training to improve sample efficiency
- adversarial training
