ICLR2025 Mathur: MIND Adaptive Thinking with Dynamic Computation

ICLR2025 Mathur: MIND Adaptive Thinking with Dynamic Computation

Motivation

Standard computation doesn’t adapt.

Fixed-Point Iteration for Adaptation

method: CNN

for every layer, perform fixed-point iteration until convergence to mask out (what exactly?)
supervise also an “introspection model” to skip the entire fixed point
loss: LM + supervision for the introspection model

method: MIND-transformer

for every layer, perform fixed-point iteration until attention activation convergence
ditto introspection as above