Posts

Model Evaluation

Last edited: August 8, 2025

Extrinsic Evaluation

Extrinsic Evaluation, also known as In-Vivo Evaluation, focuses on benchmarking two language models in terms of their differing performance on a test task.

Intrinsic Evaluation

In-Vitro Evaluation or Intrinsic Evaluation focuses on evaluating the language models’ performance at, well, language modeling.

Typically, we use perplexity.

  • directly measure language model performance
  • doesn’t necessarily correspond with real applications

model-based reinforcement learning

Last edited: August 8, 2025

Step 1: Getting Model

We want a model

  • \(T\): transition probability
  • \(R\): rewards

Maximum Likelihood Parameter Learning Method

\begin{equation} N(s,a,s’) \end{equation}

which is the count of transitions from \(s,a\) to \(s’\) and increment it as \(s, a, s’\) gets observed. This makes, with Maximum Likelihood Parameter Learning:

\begin{equation} T(s’ | s,a) = \frac{N(s,a,s’)}{\sum_{s’’}^{} N(s,a,s’’)} \end{equation}

We also keep a table:

\begin{equation} p(s,a) \end{equation}

the sum of rewards when taking \(s,a\). To calculate a reward, we take the average:

model-free reinforcement learning

Last edited: August 8, 2025

In model-based reinforcement learning, we tried real hard to get \(T\) and \(R\). What if we just estimated \(Q(s,a)\) directly? model-free reinforcement learning tends to be quite slow, compared to model-based reinforcement learning methods.

review: estimating mean of a random variable

we got \(m\) points \(x^{(1 \dots m)} \in X\) , what is the mean of \(X\)?

\begin{equation} \hat{x_{m}} = \frac{1}{m} \sum_{i=1}^{m} x^{(i)} \end{equation}

\begin{equation} \hat{x}_{m} = \hat{x}_{m-1} + \frac{1}{m} (x^{(m)} - \hat{x}_{m-1}) \end{equation}

modeling

Last edited: August 8, 2025

Here are the main steps of generic modeling.