One-Liner
There’s a low-rank “shared” universal subspace across many pretrained LMs, which could be thus leveraged to adapt a model to new tasks easier.
Notable Methods
Did a PCA, and projected variance from one architecture to others (i.e. LoRAs trained for different things).
