Houjun Liu

Bernoulli distribution

Consider a case where there’s only a single binary outcome:

  • “success”, with probability \(p\)
  • “failure”, with probability \(1-p\)


\begin{equation} X \sim Bern(p) \end{equation}


the probability mass function:

\begin{equation} P(X=k) = \begin{cases} p,\ if\ k=1\\ 1-p,\ if\ k=0\\ \end{cases} \end{equation}

This is sadly not Differentiable, which is sad for Maximum Likelihood Parameter Learning. Therefore, we write:

\begin{equation} P(X=k) = p^{k} (1-p)^{1-k} \end{equation}

Which emulates the behavior of your function at \(0\) and \(1\) and we kinda don’t care any other place.

We can use it

additional information

properties of Bernoulli distribution

Bernoulli as indicator

If there’s a series of event whose probability you are given, you can use a Bernoulli to model each one and add/subtract

MLE for Bernouli

\begin{equation} p_{MLE} = \frac{m}{n} \end{equation}

\(m\) is the number of events