ICLR2025 Koyejo
Proposal: Focus AI measurements on the validity of specific terms.
Five pillars of claim making:
- content validity: does your evaluation cover all valuable cases?
- criterion validity: does your evaluation correlate with a known validated standard?
- construct validity: does your evaluation measure the intended construct?
- external validity: does your evaluation generalize across different environments or settings?
- consequential validity: does your evaluation consider the real world impact of test interpretation and use
Open problem: validaty of measurement for claims of HAIC.
ICLR2025 Evans: AI Diversity NOT Alignment for Sustained Innovation in Human-AI Evolution
When AI systems aligns with user values, users rank them as more helpful.
Good For unpredictable system, the best is to build in checks and balances + diverse systems.
“finding ways honor and value big-bad failures—to build objectives”
ICLR2025 Laidlaw: Scalable Assistance Games
- fix a human model learned from data
- learn a model: AssistanceZero
AssistanceZero
Multi-agent environment to solve factored POMDPs while a human agent is doing somtehing.
ICLR2025 Musaffar: Learning to Lie: Adversarial Attacks Driven by Reinforcement Learning damage Human-AI Teams and LLMs
- RL driven attacks are effective to trick humans
- Chain of thought models are more sensitive to attacks