Houjun Liu

ICLR2025 HAIC

ICLR2025 Koyejo

Proposal: Focus AI measurements on the validity of specific terms.

Five pillars of claim making:

  • content validity: does your evaluation cover all valuable cases?
  • criterion validity: does your evaluation correlate with a known validated standard?
  • construct validity: does your evaluation measure the intended construct?
  • external validity: does your evaluation generalize across different environments or settings?
  • consequential validity: does your evaluation consider the real world impact of test interpretation and use

Open problem: validaty of measurement for claims of HAIC.

ICLR2025 Evans: AI Diversity NOT Alignment for Sustained Innovation in Human-AI Evolution

When AI systems aligns with user values, users rank them as more helpful.

Good For unpredictable system, the best is to build in checks and balances + diverse systems.

“finding ways honor and value big-bad failures—to build objectives”

ICLR2025 Laidlaw: Scalable Assistance Games

  1. fix a human model learned from data
  2. learn a model: AssistanceZero

AssistanceZero

Multi-agent environment to solve factored POMDPs while a human agent is doing somtehing.

ICLR2025 Musaffar: Learning to Lie: Adversarial Attacks Driven by Reinforcement Learning damage Human-AI Teams and LLMs

  1. RL driven attacks are effective to trick humans
  2. Chain of thought models are more sensitive to attacks