ICLR2025 Thursday Morning Posters

Last edited: August 8, 2025

ICLR2025 Hu: belief state transformer

Key insight: residual stream at the last token kept thought of as a belief state encoding future tokens, that is, uncertainty in the last residual directly correlate the diversity of output

Method: trainer transformer and trainer reverse transformer like what Robert wanted, then correlate

ICLR2025 Lingam: diversity of thoughts

Key insight: Use iterative sampling to achieve higher diversity in self reflection, in order to get better outputs.

ICLR2025 Tokenizer-Free Approaches

Last edited: August 8, 2025

Talks

Downsides of Subword Tokenization

not learned end to end: vocab is fixed, can’t adapt to difficulty
non-smoothness: similar inputs get mapped to very different token sequences
- [token][ization]
- typo: [token][zi][ation] <- suddenly bad despite small typo
huge vocabs: yes
non-adaptive compression ratio: you can’t decide how much to compress (affects FLOPs/document)

ICLR2025 Wu: Retrieval Head Explains Long Context

Last edited: August 8, 2025

Motivation

Previous works contain “heads” that perform some specific mechanism from context retrieval.

Retrieval Head

Authors shows that Retrieval Heads exist in transformers: using Needle in a Haystack framework.

Key Insight

There exists certain heads which performs retrieval, as measured by the retrieval score.

Methods

Measuring Retrieval Behavior

“retrieval score”: how often does a head engage in copy-paste behavior.

token inclusion: current generated token \(w\) is in the edle
maximal attention: same token gives the maximum attenion score

ICLR2025 Yue: Inference Scaling for Long-Context RAG

Last edited: August 8, 2025

“RAG performance can scale almost linearly w.r.t. log inference FLOPs”

Demonstration Based RAG (DRAG)

Method

Adding demonstrations as k in-context examples.

Prompt: documents, input query, final answer.

Parameters: number of documents, number of in context samples, number of iterations upper bound.

Iterative Demonstration Based RAG (IterDRAG)

Method

DRAG above, and then the model can generate a new sub-query. The model decides

Parameters: number of documents, number of in context samples, number of iterations upper bound.

identity

Last edited: August 8, 2025

identities allows another number to retain its identity after an operation.

What identities are applicable is group dependent. Identities are almost always object dependent.