_index.org

Common Spark Transformations

Last edited: August 8, 2025
  • map(func): apply a function on all functions
  • filter(func): filter based on function
  • flatMap(func): flatten returned lists into one giant list
  • union(rdd): create a union of multiple RDD0
  • subtract(rdd): subtract RDDs
  • cartesian(rdd): cartesian product of rdd
  • parallelize(list): make an RDD from list

Special transformations for Pair RDDs

  • reduceByKey(func): key things
  • groupByKey(func): key things
  • sortByKey(func): key things

See also Database “Join”

Communication Complexity (Chapter)

Last edited: August 8, 2025

Communication Complexity tries to model one aspect of distributed computing.

See Communication Complexity

Let us consider two parties—Alice and Bob. They want to compute some function:

\begin{equation} f: \qty{0,1}^{*} \times \qty{0,1}^{ *} \to \qty{0,1} \end{equation}

against two inputs held by Alice and Bob respectively, \(x \in \qty{0,1}^{*}\) and \(y \in \qty{0,1}^{ *}\), where \(|x| = |y| = n\), where \(n\) is very large (i.e. just sending all of \(x\) over to the other party and compute \(f\) on one end isn’t good).

commutativity

Last edited: August 8, 2025

commutativity means that the same operation can be ran in any order.

That is:

\begin{equation} ABC = ACB \end{equation}

comparison function

Last edited: August 8, 2025
  1. return < 0 if first value should come before second value
  2. return > 0 if first value should come AFTEr second value
  3. 0 if the first and second value are equivalent

Compilers Index

Last edited: August 8, 2025

cs143.stanford.edu

Lectures

Lexing

Logistics

  • programming assignments: myth
  • psets: gradescope
  • labs: myth.stanford.edu
    • /afs/ir/class/cs143
  • finals and midterms

Textbook: the purple dragonbook

Structure

  • written assignments (2.5*4 = 10%)
  • programming assignments (10+10+15+15 = 50%); 4 of them
    • lexer
    • parser
    • semantic analysis parse
    • code generator
  • midterm (15%)
  • final (25%)

Compiler we will Build

  • lexer: strings -> tokens
    • built with Flex FSTs
    • smallest part of the code
  • parser: tokens -> AST
    • built with Bison CFGs
    • linear time parsing!
  • semantic analysis: AST
    • C++
    • check types + properties
    • fill in types for expressions in the tree
  • [PA5: optimization - TBD]
  • code generation: AST -> MIPS
    • C++
    • write MIPS
    • yay!

Emphasis: correctness over performance.