Motivating Questions
Can we do better than \(O\qty(n \log n)\) sort?
- What’s our model of computation?
- What are some reasonable models of computation?
comparison based sorting
merge sort, quicksort, insertion sort.
- you can’t access values directly
- you can only compare two items (i.e. if something is better / smaller than another)
Any deterministic, comparison-based sorting algorithm must take \(\Omega\qty(n \log\qty(n))\) steps of computation.
Proof idea—any comparison based sorting algorithm gives rise to a decision tree. The decision tree must have this many steps. The structure of the tree is as follows: nodes are each comparison decision, each nodes has 2 children (one for each ordering between two items), leaf nodes represents the output. The deepest leaf gives the lower bound.
For input with length \(n\), it should have \(n!\) leafs depending on the choice of input ordering. The shallowest such tree would have \(\log \qty(n!)\) depth (i.e. a perfectly balanced tree). \(n!\) is about \(\qty(\frac{n}{e})^{n}\). Taking the log of that gives that our depth is likely \(n \log \qty(n)\).
heuristic based sorting
Many times the content of the thing we are sorting has well ordering (i.e. they are numbers). There are then \(O\qty(n)\) sorting algorithms.
counting sort
Find the maximum of input list is \(O\qty(n)\). Then allocate \(1…\max\) buckets, each a linked list. FIFO insert items into the buckets by iterating through the list again. Then dump the buckets in sorted order.
This is rather silly if you have a very large number (so you would initialize way to many sparsely-populated buckets).
radix sort
Insight: numbers that are “bigger” than each other has each digit equal or bigger than the other digit.
- run counting sort on the least significant digit
- run counting sort on the second least significant digit
- …. repeat ad nausium ….
- your list is sorted
And so, proof:
- IH: after the kth iteration, the array is sorted by the first k significant digits.
- base case: after 0th digit
- induction: for two elements \(x,y \in A\) such that \(x<y\), we want to show that up to \(k\) digit is sorted
- case 1: \(x_1 < y_1\), we desire \(x_2x_1 < y_2y_1\) (x would be in an earlier bucket than y after counting sort, so pulling out the list is just good)
- case 2: \(x_1 = y_1\), we desire \(x_2 x_1 < y_2 y_1\) (because of FIFO condition, if the first digit was inserted based on \(x_1 < y_1\), then it will be pulled out ditto)
improving efficiency
You can just start encoding the inputs in increasingly smaller bases.
general running time
For \(n\) integers, \(M\) maximum value in input list, is base \(r\).
- number of iterations: \(d = \lfloor \log_{r}\qty(M) \rfloor + 1\)
- time per iterations \(O\qty(n+r)\) (\(r\) buckets, \(n\) items into each)
- total time: \(O\qty(d\cdot\qty(n+r)) = O\qty(\qty(\lfloor \log_{r}\qty(M)\rfloor +1)\cdot\qty(n+r))\)
We typically choose \(r = n\), which gives:
\begin{equation} O\qty(n \cdot\qty( \lfloor \log_{n}\qty(M) \rfloor +1 )) \end{equation}
- if \(M \leq n^{c}\) for constant \(c\), this would be \(O\qty(n)\)
- if \(M = 2^{n}\), this is \(O\qty(n^{2})\)