Houjun Liu

Model Evaluation

Extrinsic Evaluation

Extrinsic Evaluation, also known as In-Vivo Evaluation, focuses on benchmarking two language models in terms of their differing performance on a test task.

Intrinsic Evaluation

In-Vitro Evaluation or Intrinsic Evaluation focuses on evaluating the language models’ performance at, well, language modeling.

Typically, we use perplexity.

directly measure language model performance
doesn’t necessarily correspond with real applications