Naive Bayes

constituents

  • \(x \in \qty {0,1}^{m}\) for \(m\) features
  • \(y \in \qty {0,1}\) for labels
  • \(\phi_{j|y=1}\), for \(p\qty(x_{j} = 1 | y=1)\)
  • \(\phi_{j|y=0}\), for \(p\qty(x_{j} = 1 | y=0)\)
  • \(\phi_{y}\), for \(p\qty(y=1)\), the “cost prior”

assumption

ASSUME: features in \(x\) are conditionally independent given \(y\)

That is, we assume that:

\begin{align} p\qty(x|y) &= p\qty(x_1, x_2, \dots, x_{1000} | y) \\ &= p\qty(x_1|y) p\qty(x_2|y, x_1) p\qty(x_3|y, x_1, x_2) \dots p\qty(x_{1000}|y, x_1, \dots, x_{999}) \end{align}

This is insane to compute! But if we assume all \(x\) slots are conditionally independent, we write:

\begin{equation} p\qty(x|y) = p\qty(x_1|y) p\qty(x_2|y) \dots = \prod_{j=1}^{n} p\qty(x_{j}|y) \end{equation}

requirements

To figure out best parameters for Maximum Likelihood Parameter Learning:

\begin{equation} \mathcal{L}\qty(\phi) = \prod_{i=1}^{n} p\qty(x^{(i)}, y^{(i)} \mid \phi) \end{equation}

You get exactly what you expect:

\begin{equation} \phi_{y} = p\qty(y=1) = \frac{\sum_{i=1}^{n} 1\qty {y^{(i)}=1}}{n} \end{equation}

\begin{equation} \phi_{j|y=1} = p\qty(x_{j} = 1 | y=1)= \frac{\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 1}}{\sum_{i=1}^{n} 1\qty {y^{(i)}=1}} \end{equation}

\begin{equation} \phi_{j|y=0} = p\qty(x_{j} = 1 | y=0)= \frac{\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 0}}{\sum_{i=1}^{n} 1\qty {y^{(i)}=0}} \end{equation}

and you can just check if \(p\qty(y|x)\) more likely using Bayes rule.

additional information

pseudocounting

One problem with this approach is that it won’t handle OOD text that well. In particular, suppose you never see a particular feature being \(1\):

\begin{equation} p\qty(x_{k} | y= 1) = 0 \end{equation}

for some \(k\). So in practice, we actually estimate probability to add pseudocounts:

Laplace Smoothing

\begin{equation} \phi_{j|y=1} = p\qty(x_{j} = 1 | y=1)= \frac{1+\sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 1}}{2+\sum_{i=1}^{n} 1\qty {y^{(i)}=1}} \end{equation}

\begin{equation} \phi_{j|y=0} = p\qty(x_{j} = 1 | y=0)= \frac{1+ \sum_{i=1}^{n}1 \qty {x_{j}^{(i)} =1, y^{(i)}= 0}}{2+ \sum_{i=1}^{n} 1\qty {y^{(i)}=0}} \end{equation}