_index.org

ICLR2025 Mathur: MIND Adaptive Thinking with Dynamic Computation

Last edited: August 8, 2025

Motivation

Standard computation doesn’t adapt.

Fixed-Point Iteration for Adaptation

method: CNN

  1. for every layer, perform fixed-point iteration until convergence to mask out (what exactly?)
  2. supervise also an “introspection model” to skip the entire fixed point
  3. loss: LM + supervision for the introspection model

method: MIND-transformer

  1. for every layer, perform fixed-point iteration until attention activation convergence
  2. ditto introspection as above

ICLR2025 Neitemeier: Hierachical Autoregressive Transformers

Last edited: August 8, 2025

“A Byte Level transformer, with some compression”

Key insight: use a [CLS] token in front of every word to train a small “tokenizer”, and then do a normal transformer on the [CLS] tokens, and then autoregressive decode out the single bytes.

Method

Hierarchical Autoregressive Transformers

We put a [cls] in front of every word. So the input looks like

[CLS] M y _ [CLS] n a m e _ [CLS] i s

We then run a small encoder over each sequence. And then you take the encoded [CLS], and run

ICLR2025 Saturday Posters

Last edited: August 8, 2025

ICLR2025 Cassidy: AssistanceZero

  1. Train reward predictor to also have rewards at test time
  2. MCTS
  3. Learn to match root node KL

Hill climbing with partial mutations of generated programs of LLMs

ICLR2025 Weller: l PromptTrirver

??

ICLR2025 Yu: robust LLM safeguard via refusal feature adversarial training

With mechanistic interpretability, we can find a sub space which is correlated with refusal, pull that up

ICLR2025 Snell: Optimality of Scaling LLM Test-Time Compute

Last edited: August 8, 2025

Compute-Optimal Scaling

Compute-Optimal Scaling is the notion of selecting the optimal configuration (beam width, search budget, etc.) dynamically / for binned question.

Approaches to “Scaling Test-Time Compute”

Three primary approaches:

  • best-of-n: roll out a bunch, reject
  • Beam Search: check against intermediate
  • lookahead search: MCTSish (do lookahead rollouts)

Key insight

  • On easy qusetion, beam search shows over-optimization and best of n is good
  • on medium/hard questions, beam search is better

Lookahead seems bad?