Universal Hash Family

\begin{equation} H = \qty {h: \qty {0,1}^{h} \to \qty {0,1}^{k}} \end{equation}

properties of pairwise independence:

  1. 1 wise independent: \(\forall y, \forall a\), \(P_{h \sim H}\qty [h\qty(y) = a] = 2^{-k}\)
  2. 2 wise independent: \(\forall y \neq y’\), \(P_{h \sim H}\qty [h\qty(y) = h\qty(y’)] = 2^{-k}\)

You will notice that this is not full randomness, just that there are some amount of randomness.

requirements

For all \(u_{i}, u_{j} \in U\) in the universe, \(n\) buckets in terms of desired randomness, with \(u_{i} \neq u_{j}\), then:

\begin{equation} P_{h\in H} \qty {h\qty(u_{i}) = h\qty(u_{j})} \leq \frac{1}{n} \end{equation}

A universal hash family is 2-wise indpendent

additional information

example of universal hash family

Pick prime \(p \geq M\) (for size of universe \(M = |U|\)); define:

\begin{align} f_{a,b}\qty(x) = ax + b\ \text{mod}\ p \end{align}

\begin{equation} h_{a,b}\qty(x) = f_{a,b}\qty(x)\ \text{mod}\ n \end{equation}

Our hash family is determined by:

\begin{equation} H = \qty {h_{a,b}\qty(x) \mid a \in \qty {1, \dots, p-1}, b \in \qty {0, \dots, p-1}} \end{equation}

The above hash family takes \(O\qty(\log M)\) to store

We have \(p\qty(p-1)\) choices for \(a,b\), and since \(p \geq M\), we have \(|H| = O\qty(M^{2})\). Storing \(a,b\) takes \(\log \qty(p)\) bits to store, which gives \(2\log \qty(p) = O\qty(\log\qty(M))\) bits to store if \(p \sim M\). Note also that \(h\) is decently fast to evaluate.

This gives \(O\qty(n)\) buckets, \(O\qty(n)\) items with \(\log \qty(M)\) bits per item (because we need to ID each item wrt universe), and \(O\qty(\log M)\) to store the hash function.

example of 1/2-wise independence

Let’s pick \(k\) strings \(r^{(1)} … r^{(k)} \sim \qty {0,1}^{n}\) uniform random things, and \(k\) bits \(b_{1}, …, b_{k} \sim \qty {0,1}\) uniform random bits. This is \(O\qty(kn)\) bits of randomness.

\begin{equation} h\qty(y) = \qty(r^{(1)}y + b_1, r^{(2)} y + b_2, \dots, r^{(k)} y + b_{k}) \end{equation}

This is 1-wise random due to the randomness of \(b_{j}\), since will shift into \(2^{-k}\) random. For 2 wise independenec