Common Spark Actions
Last edited: August 8, 2025Common Spark Transformations
Last edited: August 8, 2025map(func): apply a function on all functionsfilter(func): filter based on functionflatMap(func): flatten returned lists into one giant listunion(rdd): create a union of multiple RDD0subtract(rdd): subtract RDDscartesian(rdd): cartesian product of rddparallelize(list): make an RDD from list
Special transformations for Pair RDDs
reduceByKey(func): key thingsgroupByKey(func): key thingssortByKey(func): key things
See also Database “Join”
Communication Complexity (Chapter)
Last edited: August 8, 2025Communication Complexity tries to model one aspect of distributed computing.
Let us consider two parties—Alice and Bob. They want to compute some function:
\begin{equation} f: \qty{0,1}^{*} \times \qty{0,1}^{ *} \to \qty{0,1} \end{equation}
against two inputs held by Alice and Bob respectively, \(x \in \qty{0,1}^{*}\) and \(y \in \qty{0,1}^{ *}\), where \(|x| = |y| = n\), where \(n\) is very large (i.e. just sending all of \(x\) over to the other party and compute \(f\) on one end isn’t good).
commutativity
Last edited: August 8, 2025commutativity means that the same operation can be ran in any order.
That is:
\begin{equation} ABC = ACB \end{equation}
comparison function
Last edited: August 8, 2025- return < 0 if first value should come before second value
- return > 0 if first value should come AFTEr second value
- 0 if the first and second value are equivalent
