Houjun Liu

ICLR2025 Wu: Retrieval Head Explains Long Context

Motivation

Previous works contain “heads” that perform some specific mechanism from context retrieval.

Retrieval Head

Authors shows that Retrieval Heads exist in transformers: using Needle in a Haystack framework.

Key Insight

There exists certain heads which performs retrieval, as measured by the retrieval score.

Methods

Measuring Retrieval Behavior

“retrieval score”: how often does a head engage in copy-paste behavior.

  1. token inclusion: current generated token \(w\) is in the edle
  2. maximal attention: same token gives the maximum attenion score