\begin{equation} \text{SELECT}\qty(A,k) \end{equation}
returns the kth smallest element of \(A\).
additional information
properties of k-select
- \(\text{SELECT}\qty(A,1)\) is the smallest
- \(\text{SELECT}\qty(A, \frac{n}{2})\) is the median
- \(\text{SELECT}\qty(A,n)\) is the maximum
a naive solution
Naively, \(\text{SELECT}\qty(A,C)\) for any constant \(C\) is in \(O\qty(n)\) because you can just keep track of the \(C\) smallest array. But turns out, something like the median \(\text{SELECT}\qty(A, \frac{n}{2})\) is basically insertion sort, so \(O\qty(n^{2})\).
an actual linear-time solution
Proof idea: we want to pick a “pivot” value which is in the list; then partition the array into things that are less than that value, or more than that value.
Suppose \(L\) is the length of the list less than the pivot value, and \(k = \text{len}\qty(L)+1\), then we can just return the pivot value.
If \(k < \text{len}\qty(L)+1\), then we return \(\text{SELECT}\qty(L, k)\), if \(k > \text{len}\qty(L)+1\), \(\text{SELECT}\qty(L, k-\qty(len\qty(L)+1))\).
def SELECT(A,k):
####
# if the list is too small, sort and return
# this is fine asymptotically since list size is constant
# and we do this to avoid dealing with base case
###
D = getPivot(A)
L,pVal,R = Partition(A,p)
if len(L) = k-1: return pVal
if len(L) > k-1: return SELECT(L,k)
if len(L) < k-1: return SELECT(L,k-len(L)-1)
picking a pivot
Intuitively we want the pivot to be in the “middle.” Ideally, this means picking a pivot that divides the list down the median. We can’t get here, but we can approximate it by:
\begin{equation} \frac{3n}{10} < \text{len}\qty(L) < \frac{7n}{10} \end{equation}
\begin{equation} \frac{3n}{10} < \text{len}\qty( R) < \frac{7n}{10} \end{equation}
If we could just pick \(\frac{7n}{10}\) for the pivot location, the master theorem still says \(O\qty(n)\) runtime. Turns out, we (approximately) can!
Let’s break our list into literally \(5\) pieces, then:
def choosepivot(A):
split = A.split(ceil(len(A)/5))
p_i = []
for i in 1...len(split):
# since each of the split groups have length
# of exactly 5, this takes O(n)
p_i.append(find_median(split[i]))
p = SELECT(p_i, len(split)/2) # recursively call select
return p
If we choose the pivots like this, then \(|L|, |R| \leq \frac{7n}{10} + 5\).
The overall runtime, then:
\begin{equation} T\qty(n) = O\qty(n) + T\qty(\frac{7n}{10}+5) + T\qty(\frac{n}{5}) \end{equation}
where the middle is to recurse over
correctness
runtime
The time spent in not in recursive calls is \(O\qty(n)\) to partition the list before and after the pivot.
So the recursion is:
\begin{equation} T\qty(n) = \begin{cases} O\qty(n), \text{len}\qty(L) = k-1 \\ T\qty(|L|) +O\qty(n), \text{len}\qty(L) > k-1 \\ T\qty(|R|) + O\qty(n), \text{len}\qty(L) < k-1 \end{cases} + O\qty(n) \end{equation}
Using the master theorem, suppose our pivot divided the list evenly in half, we obtain
\begin{equation} T\qty(n) = T\qty(\frac{n-1}{2}) + O\qty(n) \end{equation}