SU-ENGR76 APR042024

Last edited: August 8, 2025

information

Information is the amount of surprise a message provides.

Shannon (1948): a mathematical theory for communication.

information value

The information value, or entropy, of a information source is its probability-weighted average surprise of all possible outcomes:

\begin{equation} H(X) = \sum_{x \in X}^{} s(P(X=x)) P(X=x) \end{equation}

properties of entropy

entropy is positive: \(H(x) \geq 0\)
entropy of uniform: for \(M \sim CatUni[1, …, n]\), \(p_{i} = \frac{1}{|M|} = \frac{1}{n}\), and \(H(M) = \log_{2} |M| = \log_{2} n\)
entropy is bounded: \(0 \leq H(X) \leq H(M)\) where \(|X| = |M|\) and \(M \sim CatUni[1 … n]\) (“uniform distribution has the highest entropy”); we will reach the upper bound IFF \(X\) is uniformly distributed.

binary entropy function

For some binary outcome \(X \in \{1,2\}\), where \(P(x=1) = p_1\), \(P(X_2 = 2) = 1-p_1\). We can write:

SU-ENGR76 APR092024

Last edited: August 8, 2025

joint entropy

Suppose we have two information sources; joint entropy is the measure of joint surprise when we are defined over more than one information source.

For a pair of random variables, \(X, Y\), their joint entropy is defined over a new random variable of \(X \cup Y\).

\begin{equation} H(x,y) = \sum_{i \in X}^{} \sum_{j \in Y}^{} P(X=i, Y=j) \log_{2} \qty(\frac{1}{P(X=i, Y=j)}) \end{equation}

If \(X \perp Y\), we can write \(H(x,y) = H(x)+H(y)\) (shown below.) Further, we have for all \(X\) and \(Y\), \(H(x,y) \leq H(x) + H(y)\) because you can never be more surprised than if you got two completely independent pieces of information.

SU-ENGR76 APR112024

Last edited: August 8, 2025

We want a way of generating a Prefix-Free code for any Source Coding problem.

Huffman Coding

See Huffman Coding

Diadic Source

Diadic Source is an information source for which the probability of occurrence for each symbol is a member of \(\frac{1}{2^{n}}\) for integer \(n\).

That is, the probability of each symbol is in powers of \(\frac{1}{2}\).

In THESE sources, Huffman Coding will always result in a code that communicates the information in the same number of bits as the entropy of the source.

SU-ENGR76 APR162024

Last edited: August 8, 2025

Non-IID Sequence Can Have Smaller Entropy

For sequences that are not IID, we may have:

\begin{equation} H(X_1, \dots, X_{n)} \ll \sum_{j=1}^{n} H(X_{j}) \end{equation}

This means that for very dependent sequences:

\begin{equation} \lim_{n \to \infty} \frac{H(X_1, \dots, X_{n})}{n} \ll \sum_{j=1}^{n}H(x_{j}) \end{equation}

so to measure how good our compression is, we should use this.

signal

a signal is, mathematically, just a function.

\begin{equation} f: \mathbb{R}^{n} \to \mathbb{R}^{m} \end{equation}

whereby the input is space (time, coordinates, etc.) and the output is the “signal” (pressure, level of gray, RGB, etc.)

SU-ENGR76 APR182024

Last edited: August 8, 2025

Fourier Series as exactly a shifted sum of sinusoids

Key idea: every periodic function with period \(L\) can be represented as a sum of sinusoids

\begin{equation} f(t) = A_0 + \sum_{i=1}^{\infty} B_{j} \sin \qty(k \omega t + \phi_{j}) \end{equation}

where \(\omega = \frac{2\pi}{T}\). notice! without the \(A_0\) shift, our thing would integrate to \(0\) for every \(L\); hence, to bias the mean, we change \(A_0\).

Now, we ideally really want to get rid of that shift term \(\phi\), applying the sin sum formula: