ICLR2025 Li: MoE is secretly an embedding
Last edited: August 8, 2025motivation
Can we directly extract embeddings from MoE forwarding routing weights (i.e., compared to traditional residual stream information)?
Key Insight
Using residual states vs. forwarding weights as semantic searc embeddings offer complementary strengths (i.e., when one method fails, the other one succeeds more)
Method
Create an aggregate embedding:
\begin{equation} E_{j} = X_{j} + \alpha W_{j} \end{equation}
where \(W_{j}\) is the routing weight of the residual, and \(X_{j}\) is the residual.
ICLR2025 Mathur: MIND Adaptive Thinking with Dynamic Computation
Last edited: August 8, 2025Motivation
Standard computation doesn’t adapt.
Fixed-Point Iteration for Adaptation
method: CNN
- for every layer, perform fixed-point iteration until convergence to mask out (what exactly?)
- supervise also an “introspection model” to skip the entire fixed point
- loss: LM + supervision for the introspection model
method: MIND-transformer
- for every layer, perform fixed-point iteration until attention activation convergence
- ditto introspection as above
ICLR2025 MoE
Last edited: August 8, 2025Talks
ICLR2025 Neitemeier: Hierachical Autoregressive Transformers
Last edited: August 8, 2025“A Byte Level transformer, with some compression”
Key insight: use a [CLS] token in front of every word to train a small “tokenizer”, and then do a normal transformer on the [CLS] tokens, and then autoregressive decode out the single bytes.
Method
Hierarchical Autoregressive Transformers
We put a [cls] in front of every word. So the input looks like
[CLS] M y _ [CLS] n a m e _ [CLS] i s
We then run a small encoder over each sequence. And then you take the encoded [CLS], and run
ICLR2025 Saturday Posters
Last edited: August 8, 2025ICLR2025 Cassidy: AssistanceZero
- Train reward predictor to also have rewards at test time
- MCTS
- Learn to match root node KL
ICLR2025 Liu: synthesizing programmatic reinforcement learning policies with LLM guided search
Hill climbing with partial mutations of generated programs of LLMs
ICLR2025 Weller: l PromptTrirver
??
ICLR2025 Yu: robust LLM safeguard via refusal feature adversarial training
With mechanistic interpretability, we can find a sub space which is correlated with refusal, pull that up
