MOEReview Kaushik: Universal Subspace Hypothesis

One-Liner

There’s a low-rank “shared” universal subspace across many pretrained LMs, which could be thus leveraged to adapt a model to new tasks easier.

Did a PCA, and projected variance from one architecture to others (i.e. LoRAs trained for different things).