ICLR2025 Saturday Posters

ICLR2025 Cassidy: AssistanceZero

Train reward predictor to also have rewards at test time
MCTS
Learn to match root node KL

ICLR2025 Liu: synthesizing programmatic reinforcement learning policies with LLM guided search

Hill climbing with partial mutations of generated programs of LLMs

ICLR2025 Weller: l PromptTrirver

ICLR2025 Yu: robust LLM safeguard via refusal feature adversarial training

With mechanistic interpretability, we can find a sub space which is correlated with refusal, pull that up

ICLR2025 Thrush: improving pre-training data using complexity correlation

For each pre-training data domain, measure perplexity using existing language model, wrong correlations, train a fast text data sampler using samples of high correlations with ppl

ICLR2025 Muenninghoff: generative representational instruction tuning

Train your model to do both the instructions and embeddings

ICLR2025 Aycock: can LM really learn to translate a low resource language from grammar books

LLM’s mostly learn parallel examples from grammar book in order for translation, gain from actually having grammar comes only in linguistic tasks

ICLR2025 Kaplan: from tokens to words

Interpretability results show that last token embeddings is often how multiword tokens are represented in language model; so, why don’t we use the last token embedding as new embedding to insert more vocab into the language model, also, for typos, we can use logit lens to trace at one point of a typo becomes corrected

ICLR2025 Kumarappan: lifeline learning for formal theorem proving

lean prover; tree research with intermediate rewards to identify useful partial steps, online learning using these steps, sequence problems in curriculum learning fashion to build on using simple proofs

ICLR2025 Goyal: context parametric conversion

Although instruction tuning improves context reliance initially, context reliance drops after time of in instruction fine-tuning despite standard performance benchmarks increasing

ICLR2025 Wan: self improving many shot reasoners

for in context learning, run Bayesian optimization steps on some of the outputs, and use that to generate new in context examples

ICLR2025 Hsu: grounding by trying

Do you RAG by trying some queries and for the good ones do SFT and then DPO

ICLR2025 Jia: your proofs are use for optimization based jailbreaking

update multiple tokens at once in order to make attack efficiency faster

ICLR2025 Kang: visual attention sink in large language models

Visual language models tend to attend to stuff that is kind of irrelevant, where a good intervention is to distribute attention weights from irrelevant things to relevant things

ICLR2025 Huang: steering LLM behavior with concept activation factors

Method: extract last token embedding using positive and negative instructions. From there, steer LLMs using these in bedding

ICLR2025 Wang: perplexity trap

documents, under semantic permutation, is considered more relevant when the perplexity is lower