GenSLMs are a LLM, but genome sequence
- Take genome sequence
- Throw transformers at it
- create “semantic embedding”
- autoregression happens
This is trained as a foundational model to organize the genomic sequence
Turns out, the embedding space above can be used to discover relations. See proteins can be encoded as hierarchies