Posts

column space

Last edited: August 8, 2025

combination

Last edited: August 8, 2025

A combination is a choice task which shows that order does not matter.

\begin{equation} \mqty(n \\k) = \frac{n!}{k!(n-k)!} = n! \times 1 \times \frac{1}{k!} \times \frac{1}{(n-k)!} \end{equation}

This could be shown as follows: we first permute the group of people \(n\) (\(n!\)); take the first \(k\) of them (only 1 chose); we remove the overcounted order from the \(k\) subset chosen (\(\frac{1}{k!}\)),; we remove the overcounted order from the \(n-k\) subset (\(\frac{1}{(n-k)!}\)).

Combinator Calculus

Last edited: August 8, 2025

combinator is a variable free programming language; it is a turing complete computational formalism.

  • this is a language of functions
  • it is extremely minimalist
    • clear away the complexity of a real language
    • allows for illustration of ideas

combinator

a combinator is a function with no free variables

Why do we care?

  • no variables! its entirely compositional
  • all computations are rewrite rules => making proofs like confluence, etc. easier
  • its functional: we don’t reason about individual data accesses, which is a natural fit for bulk and parallel data
    • …variables are often a problem is parallel computation

Why do we not care?

Duplication is really hard in SKI; we had to use \(S\) and possibly a \(K\) to get multiple thing to be passed. This is basically the only way we can pass information around—we have to drill any data all the way down with \(S\) until you consume it

Common Spark Actions

Last edited: August 8, 2025
  • collect(): get all of your data
  • count(): get a count of the elements in the RDD
  • countByValue(): list the times each value appears
  • reduce(func): the reduce part of MapReduce
  • first(), take(n): return some number of elements
  • top(n): return the highest n values in the list

Common Spark Transformations

Last edited: August 8, 2025
  • map(func): apply a function on all functions
  • filter(func): filter based on function
  • flatMap(func): flatten returned lists into one giant list
  • union(rdd): create a union of multiple RDD0
  • subtract(rdd): subtract RDDs
  • cartesian(rdd): cartesian product of rdd
  • parallelize(list): make an RDD from list

Special transformations for Pair RDDs

  • reduceByKey(func): key things
  • groupByKey(func): key things
  • sortByKey(func): key things

See also Database “Join”