## information

*Information is the amount of surprise a message provides.*

Shannon (1948): a mathematical theory for communication.

### information value

The information value, or entropy, of a information source is its probability-weighted average surprise of all possible outcomes:

\begin{equation} H(X) = \sum_{x \in X}^{} s(P(X=x)) P(X=x) \end{equation}

#### properties of entropy

**entropy is positive**: \(H(x) \geq 0\)**entropy of uniform**: for \(M \sim CatUni[1, …, n]\), \(p_{i} = \frac{1}{|M|} = \frac{1}{n}\), and \(H(M) = \log_{2} |M| = \log_{2} n\)**entropy is bounded**: \(0 \leq H(X) \leq H(M)\) where \(|X| = |M|\) and \(M \sim CatUni[1 … n]\) (“uniform distribution has the highest entropy”); we will reach the upper bound IFF \(X\) is uniformly distributed.

#### binary entropy function

For some binary outcome \(X \in \{1,2\}\), where \(P(x=1) = p_1\), \(P(X_2 = 2) = 1-p_1\). We can write:

\begin{equation} H_{2}(p_1) = p_1 \log_{2} \frac{1}{p_1} + (1-p_1) \log_{2} \frac{1}{1-p_1} \end{equation}

If you plot this out, we get a cap-like function, whereby at \(H(0) = 0\), \(H(1) = 0\), but \(H(0.5) = 1\) — information sources are most effective when what’s communicated is ambiguous.

### information source

We model an information source as an random variable. A random variable can take on any number of information, until you GET the information, and you will get an exact value. Each source has a range of possible values it can communicate (which is the support of the random variable representing the information source).

We will then define the surprise of a piece of information as a (decreasing function? of) the probability corresponding to the event of receiving that information.

### surprise

IMPORTANTLY: this class uses base \(2\), but the base is unimportant.

\begin{equation} s(p) = \log_{2} \frac{1}{p} \end{equation}

#### Properties of Surprise

- log-base-2 surprise has units “bits”
- \(s(1) = 0\)
- \(p \to 0, s(p) \to \infty\)
- \(s(p) \geq 0\)
- “joint surprise” \(s(p,q) = s(p) + s(q)\)

#### Facts about Surprise

- surprise should probably decrease with increasing \(p\)
- surprise should be continuous in \(p\)
- for \(s(p_i), s(q_{j})\), for two events \(p_i\) and \(q_{j}\), the “surprise” of \(s(pi, q_{j})\) = \(s(p_{i}) + s(q_{j})\)
- surprise satisfies the fact that something with probably \(0\) happening, we should be infinitely surprises; if something happens with iincreasing higher probabiity, surprise would be low

The surprise function is the **unique** function which satisfies all of the above property.

### additional information

**information is relative to the domain which is attempted to be communicated**