Extrinsic Evaluation, also known as In-Vivo Evaluation, focuses on benchmarking two language models in terms of their differing performance on a test task.
In-Vitro Evaluation or Intrinsic Evaluation focuses on evaluating the language models’ performance at, well, language modeling.
Typically, we use perplexity.
- directly measure language model performance
- doesn’t necessarily correspond with real applications