probability of an event is the proportion of times the event occurs in many repeated trials. It is “our belief that an event \(E\) occurs”.

## Frequentist Definition of Probability

That is, it is a number between \(0-1\). Whereby:

\begin{equation} P(E) = \lim_{n \to \infty} \frac{n(E)}{n} \end{equation}

“frequentist definition of probability”

probability is the ratio between the number of times \(E\) occurring \(n(E)\) divided by the number of times you did the thing \(n\). This system converge because of the law of large numbers.

## uncertainty and probability

Say you are training some kind of model. When it says \(0.8\) for motorcycle, its not that there are \(80\%\) chance that there’s a motorcycle there. Its that the model is \(80\%\) confident that there’s a motorcycle.

**Probability can not only represent the world, but our understanding of the world**

## axiom of probability

- \(0 \leq P(E) \leq 1\)
- \(P(S) = 1\), where \(S\) is the sample space
- if \(E\) and \(F\) are mutually exclusive, \(P(E) + P(F) = P(E \cup F)\)

This last axiom can be chained

This results in three correlaries:

- \(P(E^{C}) = 1- P(E)\)

Proof: We know that \(E^{C}, E\) are mutually exclusive.

\begin{equation} P(E^{C} \cup E) = P(E) + P(E^{C}) \end{equation}

Now, recall the fact that something happening OR not happening is \(1\).

So we have:

- \(P(E \cup F) = P(E) + P(F) - P(E \cap F)\)
- if \(E \subset F\), \(P(E) \leq P(F)\)

## conditional probability

“What is the new belief that something \(E\) happened, conditioned upon the fact that we know that \(F\) already happened.”

Written as: \(P(E|F)\).

Furthermore, we have:

\begin{equation} P (X, Y) = P(X\mid Y) \cdot P(Y) \end{equation}

In this case, we call \(Y\) the “evidence”. this allows us to find “what is the chance of \(x\) given \(y\)”.

We can continue this to develop the probability chain rule:

\begin{equation} P(A_1, A_2 \dots, A_{n}) = P(A_{n} \mid A_1, A_2 \dots A_{n-1})P(A_1, A_2 \dots A_{n-1}) \end{equation}

and so:

\begin{equation} P(E_1) \cdot P(E_2 | E_1) \cdot E(E_3 | E_1E_2) \cdot P(E_4 | E_1E_2E_3) \cdot \dots \cdot \end{equation}

and so on.

If you are performing the chain rule on something that’s already conditioned:

\begin{equation} P(X,Y|A) \end{equation}

you can break it up just remembering that \(A\) needs to be preserved as a condition, so:

\begin{equation} P(X,Y|A) = P(X|Y,A) P(Y|A) \end{equation}

Now:

\begin{equation} \sum_{x}^{} p(x \mid y) = 1 \end{equation}

because this is **still** a probability over \(x\).

## law of total probability

say you have two variables \(x, y\).

“what’s the probablity of \(x\)”

\begin{equation} P(x) = \sum_{Y} P(x,y) \end{equation}

a.k.a.:

\begin{equation} p(x) = p(x|y_1)p(y_1) + \dots + p(x|y_{n})y_{n} \end{equation}

by applying conditional probability formula upon each term

This is because:

\begin{align} p(x) &= p(x|y_1)p(y_1) + \dots + p(x|y_{n})y_{n} \\ &= p(x, y_1) + \dots + p(x, y_{n}) \end{align}

If its not conditional, it holds too:

\begin{equation} p(AB^{C}) + p(AB) \end{equation}

## Bayes rule

See: Bayes Theorem

## independence

If \(X\) and \(Y\) are independent (written as \(X \perp Y\)), we know that \(P(x,y) = P(x)P(y)\) for all \(x, y\).

Formally:

\begin{equation} P(A) = P(A|B) \end{equation}

if \(A\) and \(B\) is independent. That is, \(P(AB) = P(A) \cdot P(B)\). You can check either of these statements (the latter is usually easier).

Independence is bidirectional. If \(A\) is independent of \(B\), then \(B\) is independent of \(A\). To show this, invoke the Bayes Theorem.

This is generalized:

\begin{equation} P(x_1, \dots, x_n) = P(x_1) \dots p(x_{n}) \end{equation}

and this tells us that subset of \(x_{j}\) is independent against each other.