constituents
- \(x \in \qty {0,1}^{m}\) for \(m\) features
- \(y \in \qty {0,1}\) for labels
- \(\phi_{j|y=1}\), for \(p\qty(x_{j} = 1 | y=1)\)
- \(\phi_{j|y=0}\), for \(p\qty(x_{j} = 1 | y=0)\)
- \(\phi_{y}\), for \(p\qty(y=1)\), the “cost prior”
assumption
ASSUME: features in \(x\) are conditionally independent given \(y\)
That is, we assume that:
\begin{align} p\qty(x|y) &= p\qty(x_1, x_2, \dots, x_{1000} | y) \\ &= p\qty(x_1|y) p\qty(x_2|y, x_1) p\qty(x_3|y, x_1, x_2) \dots p\qty(x_{1000}|y, x_1, \dots, x_{999}) \end{align}
This is insane to compute! But if we assume all \(x\) slots are conditionally independent, we write:
\begin{equation} p\qty(x|y) = p\qty(x_1|y) p\qty(x_2|y) \dots = \prod_{j=1}^{n} p\qty(x_{j}|y) \end{equation}
requirements
To figure out best parameters for Maximum Likelihood Parameter Learning:
\begin{equation} \mathcal{L}\qty(\phi) = \prod_{i=1}^{n} p\qty(x^{(i)}, y^{(i)} \mid \phi) \end{equation}
You get exactly what you expect:
\begin{equation} \phi_{y} = p\qty(y=1) = \frac{\sum_{i=1}^{n} 1\qty {y^{(i)}=1}}{n} \end{equation}
\begin{equation} \phi_{j|y=1} = p\qty(x_{j} = 1 | y=1)= \frac{\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 1}}{\sum_{i=1}^{n} 1\qty {y^{(i)}=1}} \end{equation}
\begin{equation} \phi_{j|y=0} = p\qty(x_{j} = 1 | y=0)= \frac{\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 0}}{\sum_{i=1}^{n} 1\qty {y^{(i)}=0}} \end{equation}
and you can just check if \(p\qty(y|x)\) more likely using Bayes rule.
additional information
pseudocounting
One problem with this approach is that it won’t handle OOD text that well. In particular, suppose you never see a particular feature being \(1\):
\begin{equation} p\qty(x_{k} | y= 1) = 0 \end{equation}
for some \(k\). So in practice, we actually estimate probability to add pseudocounts:
Laplace Smoothing
\begin{equation} \phi_{j|y=1} = p\qty(x_{j} = 1 | y=1)= \frac{1+\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 1}}{2+\sum_{i=1}^{n} 1\qty {y^{(i)}=1}} \end{equation}
\begin{equation} \phi_{j|y=0} = p\qty(x_{j} = 1 | y=0)= \frac{1+ \sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 0}}{2+ \sum_{i=1}^{n} 1\qty {y^{(i)}=0}} \end{equation}