ICLR2025 Thursday Morning Posters
Last edited: August 8, 2025ICLR2025 Hu: belief state transformer
Key insight: residual stream at the last token kept thought of as a belief state encoding future tokens, that is, uncertainty in the last residual directly correlate the diversity of output
Method: trainer transformer and trainer reverse transformer like what Robert wanted, then correlate
ICLR2025 Lingam: diversity of thoughts
Key insight: Use iterative sampling to achieve higher diversity in self reflection, in order to get better outputs.
ICLR2025 Tokenizer-Free Approaches
Last edited: August 8, 2025Talks
Downsides of Subword Tokenization
- not learned end to end: vocab is fixed, can’t adapt to difficulty
- non-smoothness: similar inputs get mapped to very different token sequences
- [token][ization]
- typo: [token][zi][ation] <- suddenly bad despite small typo
- huge vocabs: yes
- non-adaptive compression ratio: you can’t decide how much to compress (affects FLOPs/document)
ICLR2025 Wu: Retrieval Head Explains Long Context
Last edited: August 8, 2025Motivation
Previous works contain “heads” that perform some specific mechanism from context retrieval.
Retrieval Head
Authors shows that Retrieval Heads exist in transformers: using Needle in a Haystack framework.
Key Insight
There exists certain heads which performs retrieval, as measured by the retrieval score.
Methods
Measuring Retrieval Behavior
“retrieval score”: how often does a head engage in copy-paste behavior.
- token inclusion: current generated token \(w\) is in the edle
- maximal attention: same token gives the maximum attenion score
ICLR2025 Yue: Inference Scaling for Long-Context RAG
Last edited: August 8, 2025“RAG performance can scale almost linearly w.r.t. log inference FLOPs”
Demonstration Based RAG (DRAG)
Method
Adding demonstrations as k in-context examples.
Prompt: documents, input query, final answer.
Parameters: number of documents, number of in context samples, number of iterations upper bound.
Iterative Demonstration Based RAG (IterDRAG)
Method
DRAG above, and then the model can generate a new sub-query. The model decides
Parameters: number of documents, number of in context samples, number of iterations upper bound.
identity
Last edited: August 8, 2025identities allows another number to retain its identity after an operation.
What identities are applicable is group dependent. Identities are almost always object dependent.
