Houjun Liu

SU-CS224N APR162024

Why do Neural Nets Work Suddenly?

Regularization

We want to be able to manipulate our parameters so that our models learn better—for instance, we want our weights to be low:

\begin{equation} J_{L2}(\theta) = J_{reg}(\theta) + \lambda \sum_{k}^{} \theta^{2}_{k} \end{equation}

or good ‘ol dropout—“fetaure dependent regularization”

Motivation

  • classic view: regularization works to prevent overfitting when we have a lot of features
  • NEW view with big models: regularization produces generalizable models when parameter count is big enough

Dropout

Dropout: prevents feature co-adaptation => results in good regularization

Language Model

See Language Model