Motivation
Standard computation doesn’t adapt.
Fixed-Point Iteration for Adaptation
method: CNN
- for every layer, perform fixed-point iteration until convergence to mask out (what exactly?)
- supervise also an “introspection model” to skip the entire fixed point
- loss: LM + supervision for the introspection model
method: MIND-transformer
- for every layer, perform fixed-point iteration until attention activation convergence
- ditto introspection as above