probability density function

Last edited: August 8, 2025

PDFs is a function that maps continuous random variables to the corresponding probability.

\begin{equation} P(a < X < b) = \int_{x=a}^{b} f(X=x)\dd{x} \end{equation}

note: $f$ is no longer in units of probability!!! it is in units of probability scaled by units of $X$. That is, they are DERIVATIVES of probabilities. That is, the units of $f$ should be $\frac{prob}{unit\ X}$. So, it can be greater than $1$.

We have two important properties:

probability distribution

Last edited: August 8, 2025

probability distributions “assigns probability to outcomes”

$X$ follows distribution $D$. $X$ is a “$D$ random variable”, where $D$ is some distribution (normal, gaussian, etc.)

syntax: $X \sim D$.

Each distribution has three properties:

variables (what is being modeled)
values (what values can they take on)
parameters (how many degrees of freedom do we have)

Types of Distribution

discrete distribution

described by PMF

continuous distribution

described by PDF

parametrized distribution

We often represent probability distribution using a set of parameters $\theta_{j}$. For instance, a normal distribution is given by $\mu$ and $\sigma$, and a PMF is by the probability mass for each.

probability mass function

Last edited: August 8, 2025

PMF is a function that maps possible outcomes of a discrete random variables to the corresponding actual probabilities.

For random variable $Y$, we have:

\begin{equation} f(k) = P(Y=k) \end{equation}

and $f$ is a function that is the PMF, which is the mapping between a random variable and a value it takes on to the probability that the random variable takes on that value.

Shorthand

\begin{equation} P(Y=k) = p(y), where\ y=k \end{equation}

its written smaller $y$ represents a case of $Y$ where $Y=y$.

Probability of Failure

Last edited: August 8, 2025

\begin{align} p_{\text{fail}} &= \mathbb{E}_{\tau \sim p\qty(\cdot)} \qty [1 \qty{\tau \not \in \psi}] \\ &= \int 1 \qty {\tau \not\in \psi} p\qty(\tau) \dd{\tau } \end{align}

that is, the Probability of Failure is just the normalizing constant of the Failure Distribution. Like with Failure Distribution itself, computing this is quite intractable. We have a few methods to solve this, namely:

direct estimation: directly approximate your failure probability from nominal distribution $p$ — $\tau_{j} \sim p\qty(\cdot)$, $\hat{p}_{\text{fail}} = \frac{1}{m} \sum_{i=1}^{m} 1\qty {\tau_{i} \not \in \psi}$
Importance Sampling: design a distribution to probe failure, namely proposal distribution $q$, and then reweight by how different it is from $p$ — $\tau_{j} \sim q\qty(\cdot)$, $\hat{p}_{\text{fail}} = \frac{1}{m}\sum_{i=1}^{m} w_{i} \mathbb{1} \qty {\tau_{i}\not \in \psi}$, call $w_{i} = \frac{p\qty(\tau_{i})}{q\qty(\tau_{i})}$ (the “importance weight”)
adaptive importance sampling
multiple importance sampling
sequential monte-carlo

How do you pick a proposal distribution? See proposal distribution.

probablistic models

Last edited: August 8, 2025

multinomial distribution

A probability distribution to model specific outcomes like a binomial distribution but for multiple variables.

like binomial distribution, we have to assume independence and same probability per trial.

“what’s the probability that you get some set of assignments xj=nj”:

\begin{equation} P(X_1=c_1, X_2=c_2, \dots, X_{m}=c_{m}) = {n \choose c_1, c_2, \dots, c_{m} } p_{1}^{c_1} \cdot \dots \cdot p_{m}^{c_{m}} \end{equation}

where the big choose is a multinomial coefficient, and $n$ is the number of different outcomes, and $p_{j}$ is the probably of the $j$th outcome.