Houjun Liu

Model Evaluation

Extrinsic Evaluation

Extrinsic Evaluation, also known as In-Vivo Evaluation, focuses on benchmarking two language models in terms of their differing performance on a test task.

Intrinsic Evaluation

In-Vitro Evaluation or Intrinsic Evaluation focuses on evaluating the language models’ performance at, well, language modeling.

Typically, we use perplexity.

  • directly measure language model performance
  • doesn’t necessarily correspond with real applications