Gaussian mixture model is a density estimation technique, which is useful for detecting out of distribution samples, etc.
We will use the superposition for a group of Gaussian distributions that would explain the dataset.
Suppose the data was generated from a Mixture of Gaussian; then for every data point \(x^{(i)}\) there is a latent \(z^{(i)}\) which tells you what Gaussian your data point is generated from.
So, for \(k\) Gaussian in your mixture:
\(z^{(i)} \in \qty {1, \dots, k}\) such that \(z^{(i)} \sim \text{MultiNom}\qty(\phi)\) (such that \(\phi_{j} \geq 0\), \(\sum_{j}^{} \phi_{j} = 1\))
- \(P(x^{(i)}|z^{(i)}} = j) \sim \mathcal{N}\qty(\mu_{j}, \Sigma_{j})\)
Recall that:
\begin{equation} P\qty(x^{(i)}, z^{(i)}) = P\qty(x^{(i)}|z^{(i)}) p\qty(z^{(i)}) \end{equation}
additional information
expectation maximization
solving for assignments \(j\), see expectation maximization
why density estimation
Given:
\begin{equation} \qty {x^{(1)} \dots x^{(n)}} \end{equation}
we would like to estimate \(p\qty(x)\), the probability that we see a particular data point \(x\).
To do this, we:
- estimate \(p\qty(x)\)
- if \(p\qty(x) \leq \epsilon\), we flag anomaly
If the data is super clumped together, getting a \(p\qty(x)\) is hard!
motivation
Gaussian models are typically unimodal, meaning they have one peak (things decrease to the left of that peak, increases to the right of it).
Therefore, in order to model something more complex with multiple peaks, we just weighted average multiple gaussian models
\begin{equation} p(x | \dots ) = \sum_{i-1}^{n}p_i \mathcal{N}(x | u_{i}, {\sigma_{i}}^{2}) \end{equation}
where we want our weights \(p_{j}\) to sum up ultimate to \(1\) because we want the ultimate thing to still integrate to \(1\).