ICLR2025 Friday Posters

ICLR2025 Morris: contextual document embeddings

Take a bunch of sentence embeddings as input to produce a new sentence embedding that is now contextual

ICLR2025 Noukhovich: asynchronous reinforcement learning for language models

Rollout and tune concurrently

ICLR2025 Yao: CR-CTC CONSISTENCY REGULATION

CTC LOSS CAN BE MADE MORE ROBUST IF YOU REGULARIZE TO HAVE MINIMAL DIFFERENCE BETWEEN TWO AUGMENTED VIEWS OF THE SAME MEL SPECTRUM

ICLR2025 Sun: ReDeEP detecting hallucination using mechanistic interpretability

Find layers most prone to insert information, measure the information insertion using logit lens before and after passing through FFN, strong change after hallucination prone FFN means hallucination

ICLR2025 Fu: CHiP

For multi model preference optimization, combine four different loss terms together a varying types of preference loss to get best results

ICLR2025 Faysse: ColPali

Embed images of text instead of text itself during rag

ICLR2025 Liu: DeLLMa

Key insight; make a language model of POMDP by asking the language model to produce value judgments and normalizing them and doing standard value iteration

ICLR2025 Wijmans: cut your losses in large vocabulary language model

Instead of decoding directly into logits, which is memory intensive, there is a trick to allow us to not have to store the entire out projection in memory

ICLR2025 Gao: progressing the relative future

Solve multiturn RLHF by writing the policy Q value and optimizing it over discounted features

ICLR2025 Xiao: SimPER preference alignment by removing hyper parameters

Remove the log term of DPO and remove thereby the hyper parameter beta that is needed

ICLR2025 Xiong: from tokens to lattices

Mask language models learn conditional relationships between tokens of the same entity thereby implicitly creating a graph

ICLR2025 Pagliardini: AdEMAMix

Key insight: Adam with two different betas and also use gradient information from multiple steps for stabler and faster convergence

ICLR2025 Fan: loop transformers for length generalization

Key insight: UT like approaches with loops generalize better for tasks of a specific kind

ICLR2025 Lee: multiple non-asymptotic rates for value iteration

Key insight: anchoring using the original policy speed up average value value iteration

That is: Vt = a*V0 + b*T*Vt-1

ICLR2025 Liu: linear combination of saves checkpoints makes diffusion and consistency models better

Key insight: as titled, use evolutionary research to figure out the best mixture of weights to select

ICLR2025 Ramapuram: theory analysis and best practices for sigmoid self attention

Key insight: sigmoid self attention reduces all gather costs and they have a bunch of tricks to make it work

ICLR2025 Sun: block verification accelerate speculative decoding

Key insight: when using a small language model to speculatively decode a large language model, evaluate likelihood blocks at a time

ICLR2025 Chang: skiable influence a fact tracing

Key insight: using a normalized gradient dot product between training examples and outputs, do attribution

ICLR2025 Hu: how to visualize training dynamics

Key insight: take whatever summary statistics you have for each checkpoint, run classical low dimensional work on it such as PCA

Key insight: take some jailbreak that doesn’t work anymore, make semantic pururbation o it, check if it still works. Often, it does.

ICLR2025 Georgiev: attribute to delete

Key; learn a data model which then allows you to perturb what pieces of input pre-training data is relevant to the actual output, using this,, with counterfactual for what the correct unlearned outcome is, and then tune against that.