k-select

\begin{equation} \text{SELECT}\qty(A,k) \end{equation}

returns the kth smallest element of \(A\).

additional information

properties of k-select

\(\text{SELECT}\qty(A,1)\) is the smallest
\(\text{SELECT}\qty(A, \frac{n}{2})\) is the median
\(\text{SELECT}\qty(A,n)\) is the maximum

a naive solution

Naively, \(\text{SELECT}\qty(A,C)\) for any constant \(C\) is in \(O\qty(n)\) because you can just keep track of the \(C\) smallest array. But turns out, something like the median \(\text{SELECT}\qty(A, \frac{n}{2})\) is basically insertion sort, so \(O\qty(n^{2})\).

an actual linear-time solution

Proof idea: we want to pick a “pivot” value which is in the list; then partition the array into things that are less than that value, or more than that value.

Suppose \(L\) is the length of the list less than the pivot value, and \(k = \text{len}\qty(L)+1\), then we can just return the pivot value.

If \(k < \text{len}\qty(L)+1\), then we return \(\text{SELECT}\qty(L, k)\), if \(k > \text{len}\qty(L)+1\), \(\text{SELECT}\qty(L, k-\qty(len\qty(L)+1))\).

def SELECT(A,k):
    ####
    # if the list is too small, sort and return
    # this is fine asymptotically since list size is constant
    # and we do this to avoid dealing with base case
    ###
    D = getPivot(A)
    L,pVal,R = Partition(A,p)
    if len(L) = k-1: return pVal
    if len(L) > k-1: return SELECT(L,k)
    if len(L) < k-1: return SELECT(L,k-len(L)-1)

picking a pivot

Intuitively we want the pivot to be in the “middle.” Ideally, this means picking a pivot that divides the list down the median. We can’t get here, but we can approximate it by:

\begin{equation} \frac{3n}{10} < \text{len}\qty(L) < \frac{7n}{10} \end{equation}

\begin{equation} \frac{3n}{10} < \text{len}\qty( R) < \frac{7n}{10} \end{equation}

If we could just pick \(\frac{7n}{10}\) for the pivot location, the master theorem still says \(O\qty(n)\) runtime. Turns out, we (approximately) can!

Let’s break our list into literally \(5\) pieces, then:

def choosepivot(A):
    split = A.split(ceil(len(A)/5))

    p_i = []
    for i in 1...len(split):
        # since each of the split groups have length
        # of exactly 5, this takes O(n)
        p_i.append(find_median(split[i]))

    p = SELECT(p_i, len(split)/2) # recursively call select

    return p

If we choose the pivots like this, then \(|L|, |R| \leq \frac{7n}{10} + 5\).

The overall runtime, then:

\begin{equation} T\qty(n) = O\qty(n) + T\qty(\frac{7n}{10}+5) + T\qty(\frac{n}{5}) \end{equation}

where the middle is to recurse over

correctness

runtime

The time spent in not in recursive calls is \(O\qty(n)\) to partition the list before and after the pivot.

So the recursion is:

\begin{equation} T\qty(n) = \begin{cases} O\qty(n), \text{len}\qty(L) = k-1 \\ T\qty(|L|) +O\qty(n), \text{len}\qty(L) > k-1 \\ T\qty(|R|) + O\qty(n), \text{len}\qty(L) < k-1 \end{cases} + O\qty(n) \end{equation}

Using the master theorem, suppose our pivot divided the list evenly in half, we obtain

\begin{equation} T\qty(n) = T\qty(\frac{n-1}{2}) + O\qty(n) \end{equation}