Houjun Liu

SU-CS143 APR242025

Bottom-Up Parsing

Bottom-Up Parsing is more general than top-down parsing (i.e., it doesn’t require left factorization, etc.)

This is generally the preferred method. Key insight: we reduce strings into increasingly higher level symbols

Sketch

Because of the Properties of Bottom-Up Parsing, we can….

initialization

  • split the string into two pieces
  • right substring has terminals, not yet examined
  • left substring has terminals and non-terminals

We begin by splitting to the left since we begin with no non-terminals.

x1 x2 x3 …. xn

moves

Shift / Reduce

  • shift: move | one place to the right — shifts a terminal to the left string
  • reduce: apply an inverse production at the right end of the left string (i.e., take a substring closest to the bar, and reduce it)

Think: left string is a stack. top: is the |; push: the “shift” operation to push a terminal onto the stack; reduce: pop something onto the stack.

conflict

When Shift/Reduce are both legal, that’s a “shift reduce” conflict; if multiple reductions are possible, that’s a “reduce reduce” conflict.

deciding moves

  • handle

    We want to reduce only if the result can still be reduced to the starting symbol.

    A handle is a single production in a legal sequence coming from \(S\) which can be reduced into a single nonterminal. Consider:

    \begin{equation} S \to^{*} \alpha X \omega \to \alpha \beta \omega \end{equation}

    then \(X \to \beta\) in the position after \(\alpha\) is a handle of \(\alpha \beta \omega\). We reduce at handles.

    The handles are always top of the stack, otherwise it’d not be a rightmost derivation in reverse.

  • SLR

    Simple Left-to-right Rightmost derivation is a class of CFGs for which we have simple heuristics for detecting handles.

  • heuristics

    • item

      An item is a production with “.” somewhere on the RHS, denoting a focus point. For instance, \(T \to .(E)\).

      The only item for \(X \to \epsilon\) is $X → .$

      We also call these LR(0) items.

    • viable prefix

      \(\alpha\) is a “viable prefix” if there exists \(\omega\) such that \(\alpha | \omega\) is a state of a shift-reduce parser.

      • a viable prefix doesn’t extend beyond the right end of handle
      • its a variable prefix because its a possible prefix of a handle
      • as long as a parser has viable prefixes on the stack, no parsing error
    • detecting viable prefixs

      We us an NFA!

      1. add a new start production \(S’ \to S\) to the grammar
      2. the NFA states are items of the grammar
      3. For item \(E \to \alpha . X \beta\) add the transition \([E \to \alpha . X \beta] \to^{x} [E \to \alpha X . \beta]\)
      4. For item \(E \to \alpha . X \beta\) and production \(X \to \gamma\), add the transition \([E \to \alpha . X \beta] \to^{\epsilon} [X \to . \gamma]\)

      every state is an accepting state, start state is \(S’ \to .S\)

    • LR(0) Parsing

      Suppose stack contains \(\alpha\), next input is \(t\). DFA on input \(\alpha\) terminated it state \(s\).

      • Reduce by \(X \to \beta\) if \(s\) contains item $X → β .$
      • Shift if \(s\) contains item \(X \to \beta .t\omega\) (that is, there’s a transition labeled \(t\)

      We will get reduce/reduce conflicts if one state contains $X → β .$ and $Y → ω .$; we will get shift/reduce conflict if $X → β.$ and \(Y \to \omega .t \gamma\)

    • LR(0) to SLR

      Exactly the same process, but!

      Suppose stack contains \(\alpha\), next input is \(t\). DFA on input \(\alpha\) terminated it state \(s\).

      • Reduce by \(X \to \beta\) if \(s\) contains item $X → β. $; and if \(t \in \text{Follow}\qty(X)\)
      • Shift if \(s\) contains item \(X \to \beta .t\omega\) (that is, there’s a transition labeled \(t\)

      If there are conflicts under these rules, then you don’t have an SLR grammar.

      Remember to keep (symbol, DFA state) on the stack.

Properties of Bottom-Up Parsing

  1. bottom-up parser traces a rightmost derivation in reverse
  2. in shift-reduce strategy, handles appear only at the top of the stack, never inside the stack
  3. for any SLR grammar, the set of viable prefixes is a regular language!
  4. Let \(\alpha \beta \omega\) be a step of bottom up parse, assume the next reduction is \(X \to \beta\), then \(\omega\) should already have been a string of terminals (since we go rightmost first)

Example

int * int + int | T -> int
int * T + int | T -> int * T
T + int | T -> int
T + T | E -> T
T + E | E -> T + E
E

Weird! It somehow starts in the middle.

Generally, the parsing strategy expands non-terminals bottom-up in the rightmost first.