Gaussian mixture model

Gaussian mixture model is a density estimation technique, which is useful for detecting out of distribution samples, etc.

We will use the superposition for a group of Gaussian distributions that would explain the dataset.

Suppose the data was generated from a Mixture of Gaussian; then for every data point \(x^{(i)}\) there is a latent \(z^{(i)}\) which tells you what Gaussian your data point is generated from.

So, for \(k\) Gaussian in your mixture:

\(z^{(i)} \in \qty {1, \dots, k}\) such that \(z^{(i)} \sim \text{MultiNom}\qty(\phi)\) (such that \(\phi_{j} \geq 0\), \(\sum_{j}^{} \phi_{j} = 1\))

And hence we can write:

\(x^{(i)}|(z^{(i)}} = j) \sim \mathcal{N}\qty(\mu_{j}, \Sigma_{j})\)

Recall that:

\begin{equation} P\qty(x^{(i)}, z^{(i)}) = P\qty(x^{(i)}|z^{(i)}) p\qty(z^{(i)}) \end{equation}

additional information

expectation maximization

solving for assignments \(j\), see expectation maximization

why density estimation

Given:

\begin{equation} \qty {x^{(1)} \dots x^{(n)}} \end{equation}

we would like to estimate \(p\qty(x)\), the probability that we see a particular data point \(x\).

To do this, we:

estimate \(p\qty(x)\)
if \(p\qty(x) \leq \epsilon\), we flag anomaly

If the data is super clumped together, getting a \(p\qty(x)\) is hard!

motivation

Gaussian models are typically unimodal, meaning they have one peak (things decrease to the left of that peak, increases to the right of it).

Therefore, in order to model something more complex with multiple peaks, we just weighted average multiple gaussian models

\begin{equation} p(x | \dots ) = \sum_{i-1}^{n}p_i \mathcal{N}(x | u_{i}, {\sigma_{i}}^{2}) \end{equation}

where we want our weights \(p_{j}\) to sum up ultimate to \(1\) because we want the ultimate thing to still integrate to \(1\).