mutual information a measure of the dependence of two random variables in information theory. Applications include collocation extraction, which would require finding how two words co-occur (which means one would contribute much less entropy than the other.)

## constituents

- \(X, Y\) random variables
- \(D_{KL}\) KL Divergence function
- \(P_{(X,Y)}\) the joint distribution of \(X,Y\)
- \(P_{X}, P_{Y}\) the marginal distributions of \(X,Y\)

## requirements

mutual information is defined as

\begin{equation} I(X ; Y) = D_{KL}(P_{ (X, Y) } | P_{X} \otimes P_{Y}) \end{equation}

“mutual information between \(X\) and \(Y\) is the additional information contributed by the "