Motivation
Previous works contain “heads” that perform some specific mechanism from context retrieval.
Retrieval Head
Authors shows that Retrieval Heads exist in transformers: using Needle in a Haystack framework.
Key Insight
There exists certain heads which performs retrieval, as measured by the retrieval score.
Methods
Measuring Retrieval Behavior
“retrieval score”: how often does a head engage in copy-paste behavior.
- token inclusion: current generated token \(w\) is in the edle
- maximal attention: same token gives the maximum attenion score