Houjun Liu

cross entropy loss

Cross Entropy Method is a “conditional MLE” objective; whereby we try to maximize:

  • the log prob
  • of the true y labels in the training data
  • given the observations

Derivation

Recall the Bernoulli distribution, and specifically:

\begin{equation} P(Y=y) = p^{y} (1-p)^{1-y} \end{equation}

Meaning, we want to maximize:

\begin{equation} \log P(y=y) = y \log p + (1-y)\log (1-y) \end{equation}

specifically, we’d like to minimize:

\begin{equation} -[y \log p + (1-y)\log (1-y)] \end{equation}

Intuition

This function should be

  • smaller when the model estimate is close to correct
  • bigger if the model is confused or wrong