Houjun Liu

ICLR2025 Mathur: MIND Adaptive Thinking with Dynamic Computation

Motivation

Standard computation doesn’t adapt.

Fixed-Point Iteration for Adaptation

method: CNN

  1. for every layer, perform fixed-point iteration until convergence to mask out (what exactly?)
  2. supervise also an “introspection model” to skip the entire fixed point
  3. loss: LM + supervision for the introspection model

method: MIND-transformer

  1. for every layer, perform fixed-point iteration until attention activation convergence
  2. ditto introspection as above