Houjun Liu

Context-Free Grammar

We use CFGs to highlight the recursive structure in expressions.

A CFG consists of

  • Non-Recursed Terminals (if, else, etc.) \(T\)
  • Non-Terminals (expr, stmt, etc.) \(N\)
  • Start symbol (i.e., program) \(S \in N\)
  • Set of productions

generative semantics

Let \(G\) be an CFG with start symbol \(S\); the language of \(G\) is…

\begin{equation} \qty {a_1, \dots, a_{n} | S \to^{*} a_{1}, a_{n}} \end{equation}

with each \(a_{j}, \forall j\) being terminals.

That is, the “language of a CFG” is just the set of languages that could be produced by as many moves as needed until we obtain the output.

productions

\begin{equation} X \to Y_1, Y_2, \dots, Y_{n} \end{equation}

whereby \(X \in N\), \(Y_{i} \in T \cup N \cup \qty {\varepsilon}\). Notably, we could replace a non-terminal with nothing.

derivation

a derivation is a sequence of productions which leads to a string of only terminals

A derivation can be drawn as a tree—

  • start symbol is the root
  • for production \(X \to Y_1, …, Y_{n}\), add children \(Y_1, …, Y_{n}\) to note \(X\)

stratification

stratification allows us to deal with the PEMDAS (/associativity) of the operations. Consider an addition and multiplication world:

E = E+E | E*E | (E) | id

how do we parse id * id + id.

We can just have a “multiplication world” E’ language, whereby after en counting an multiplication, we are not allowed to do addition anymore unless we encounter a parentheses.

E = E' + E | E'
E' = id * E' | id | (E) * E' | (E)

dangling if problem

Consider the following ambiguous grammar:

E -> if E then E
   | if E then E else E
   | ....

Consider:

if something then if something then something else something

Is the else the first or the second expression?~:w

to fix this, we need to make a world of “matched if expression”

E -> MIF -- all "then" are matched
   | UIF -- some "then" is unmatched

UIF -> if E then E
     | if E then MIF else UIF

{- "we don't allow unmatched if expressions in then branches" -}
MIF -> if E then MIF else MIF
     | OTHER

Most tools, like flex/bison, will allow you to control precedence.

conventions

  • non-terminals in upper case

  • terminals in lower case

  • start symbol is the non-terminal of the first production

  • membership in a language is a “yes”/“no”

  • we must handle errors gracefully

  • bison could help us actually implement the parser

example

EXPR -> if EXPR then EXPR else EXPR fi
      | while EXPR loop EXPR pool
      | id