probability distributions “assigns probability to outcomes”
\(X\) follows distribution \(D\). \(X\) is a “\(D\) random variable”, where \(D\) is some distribution (normal, gaussian, etc.)
syntax: \(X \sim D\).
Each distribution has three properties:
- variables (what is being modeled)
- values (what values can they take on)
- parameters (how many degrees of freedom do we have)
Methods of Compressing the Parameters of a Distribution
So, for instance, for a binary distribution with \(n\) variables which we know nothing about, we have:
\begin{equation} 2^{n} - 1 \end{equation}
parameters (\(2^{n}\) different possibilities of combinations, and \(1\) non-free variables to ensure that the distribution add up)
assuming independence
HOWEVER, if the variables were independent, this becomes much easier. Because the variables are independent, we can claim that:
\begin{equation} p(x_{1\dots n}) = \prod_{i}^{} p(x_{i)) \end{equation}
decision tree
For instance, you can have a decision tree which you selectively ignore some combinations.
In this case, we ignored \(z\) if both \(x\) and \(y\) are \(0\).
Baysian networks
see Baysian Network
types of probability distributions
distribution of note
- uniform distribution
- gaussian distributions
uniform distribution
\begin{equation} X \sim Uni(\alpha, \beta) \end{equation}
\begin{equation} f(x) = \begin{cases} \frac{1}{\beta -\alpha }, 0\leq x \leq 10 \\0 \end{cases} \end{equation}
\begin{equation} E[x] = \frac{1}{2}(\alpha +\beta) \end{equation}
\begin{equation} Var(X) = \frac{1}{12}(\beta -\alpha )^{2} \end{equation}
Gaussian Things
Truncated Gaussian distribution
Sometimes, we don’t want to use a Gaussian distribution for values above or below a threshold (say if they are physically impossible). In those cases, we have some:
\begin{equation} X \sim N(\mu, \sigma^{2}, a, b) \end{equation}
bounded within the interval of \((a,b)\). The PDF of this function is given by:
\begin{equation} N(\mu, \sigma^{2}, a, b) = \frac{\frac{1}{\sigma} \phi \qty(\frac{x-\mu }{\sigma })}{\Phi \qty(\frac{b-\mu }{\sigma }) - \Phi \qty(\frac{a-\mu}{\sigma})} \end{equation}
where:
\begin{equation} \Phi = \int_{-\infty}^{x} \phi (x’) \dd{x’} \end{equation}
and where \(\phi\) is the standard normal density function.
Gaussian mixture model
Gaussian models are typically unimodal, meaning they have one peak (things decrease to the left of that peak, increases to the right of it).
Therefore, in order to model something more complex with multiple peaks, we just weighted average multiple gaussian models
\begin{equation} p(x | \dots ) = \sum_{i-1}^{n}p_i \mathcal{N}(x | u_{i}, {\sigma_{i}}^{2}) \end{equation}
whereby,
three ways of analysis
probability density function
PDFs is a function that maps continuous random variables to the corresponding probability.
\begin{equation} P(a < X < b) = \int_{x=a}^{b} f(X=x)\dd{x} \end{equation}
note: \(f\) is no longer in units of probability!!! it is in units of probability scaled by units of \(X\). That is, they are DERIVATIVES of probabilities. That is, the units of \(f\) should be \(\frac{prob}{unit\ X}\). So, it can be greater than \(1\).
We have two important properties:
- if you integrate over any bounds over a probability density function, you get a probability
- if you integrate over infinity, the result should be \(1\)
getting exact values from PDF
There is a calculus definition for \(P(X=x)\), if absolutely needed:
\begin{equation} P(X=x) = \epsilon f(x) \end{equation}
mixing discrete and continuous random variables
Let’s say \(X\) is continuous, and \(N\) is discrete.
We desire:
\begin{equation} P(N=n|X=x) = \frac{P(X=x|N=n)P(N=n)}{P(X=x)} \end{equation}
now, to get a specific value for \(P(X=x)\), we can just multiply its PMF by a small epsilon:
\begin{align} P(N=n|X=x) &= \lim_{\epsilon \to 0} \frac{\epsilon f(X=x|N=n)P(N=n)}{\epsilon f(X=x)} \\ &= \frac{f(X=x|N=n)P(N=n)}{f(X=x)} \end{align}
this same trick works pretty much everywhere—whenever we need to get the probability of a continuous random variable with
cumulative distribution function
What is the probability that a random variable takes on value less tha
\begin{equation} cdf_{x}(x) = P(X<x) = \int_{-\infty}^{x} p(x’) dx' \end{equation}
sometimes written as:
\begin{equation} F(x) = P(X < x) \end{equation}
Recall that, with
quantile function
\begin{equation} \text{quantile}_{X}(\alpha) \end{equation}
is the value \(x\) such that:
\begin{equation} P(X \leq x) = \alpha \end{equation}
That is, the quantile function returns the minimum value of \(x\) at which point a certain cumulative distribution value desired is achieved.
adding uniform distribution
for \(1 < a < 2\)
\begin{equation} f(X+Y = a) = \begin{cases} a, 0 < a < 1, \\ 2-a, 1 < a < 2, \\ 0, otherwise \end{cases} \end{equation}