Bottom-Up Parsing
Bottom-Up Parsing is more general than top-down parsing (i.e., it doesn’t require left factorization, etc.)
This is generally the preferred method. Key insight: we reduce strings into increasingly higher level symbols
Sketch
Because of the Properties of Bottom-Up Parsing, we can….
initialization
- split the string into two pieces
- right substring has terminals, not yet examined
- left substring has terminals and non-terminals
We begin by splitting to the left since we begin with no non-terminals.
x1 x2 x3 …. xn |
moves
Shift / Reduce
- shift: move | one place to the right — shifts a terminal to the left string
- reduce: apply an inverse production at the right end of the left string (i.e., take a substring closest to the bar, and reduce it)
Think: left string is a stack. top: is the |; push: the “shift” operation to push a terminal onto the stack; reduce: pop something onto the stack.
conflict
When Shift/Reduce are both legal, that’s a “shift reduce” conflict; if multiple reductions are possible, that’s a “reduce reduce” conflict.
deciding moves
handle
We want to reduce only if the result can still be reduced to the starting symbol.
A handle is a single production in a legal sequence coming from \(S\) which can be reduced into a single nonterminal. Consider:
\begin{equation} S \to^{*} \alpha X \omega \to \alpha \beta \omega \end{equation}
then \(X \to \beta\) in the position after \(\alpha\) is a handle of \(\alpha \beta \omega\). We reduce at handles.
The handles are always top of the stack, otherwise it’d not be a rightmost derivation in reverse.
SLR
Simple Left-to-right Rightmost derivation is a class of CFGs for which we have simple heuristics for detecting handles.
heuristics
item
An item is a production with “.” somewhere on the RHS, denoting a focus point. For instance, \(T \to .(E)\).
The only item for \(X \to \epsilon\) is $X → .$
We also call these LR(0) items.
viable prefix
\(\alpha\) is a “viable prefix” if there exists \(\omega\) such that \(\alpha | \omega\) is a state of a shift-reduce parser.
- a viable prefix doesn’t extend beyond the right end of handle
- its a variable prefix because its a possible prefix of a handle
- as long as a parser has viable prefixes on the stack, no parsing error
detecting viable prefixs
We us an NFA!
- add a new start production \(S’ \to S\) to the grammar
- the NFA states are items of the grammar
- For item \(E \to \alpha . X \beta\) add the transition \([E \to \alpha . X \beta] \to^{x} [E \to \alpha X . \beta]\)
- For item \(E \to \alpha . X \beta\) and production \(X \to \gamma\), add the transition \([E \to \alpha . X \beta] \to^{\epsilon} [X \to . \gamma]\)
every state is an accepting state, start state is \(S’ \to .S\)
LR(0) Parsing
Suppose stack contains \(\alpha\), next input is \(t\). DFA on input \(\alpha\) terminated it state \(s\).
- Reduce by \(X \to \beta\) if \(s\) contains item $X → β .$
- Shift if \(s\) contains item \(X \to \beta .t\omega\) (that is, there’s a transition labeled \(t\)
We will get reduce/reduce conflicts if one state contains $X → β .$ and $Y → ω .$; we will get shift/reduce conflict if $X → β.$ and \(Y \to \omega .t \gamma\)
LR(0) to SLR
Exactly the same process, but!
Suppose stack contains \(\alpha\), next input is \(t\). DFA on input \(\alpha\) terminated it state \(s\).
- Reduce by \(X \to \beta\) if \(s\) contains item $X → β. $; and if \(t \in \text{Follow}\qty(X)\)
- Shift if \(s\) contains item \(X \to \beta .t\omega\) (that is, there’s a transition labeled \(t\)
If there are conflicts under these rules, then you don’t have an SLR grammar.
Remember to keep (symbol, DFA state) on the stack.
Properties of Bottom-Up Parsing
- bottom-up parser traces a rightmost derivation in reverse
- in shift-reduce strategy, handles appear only at the top of the stack, never inside the stack
- for any SLR grammar, the set of viable prefixes is a regular language!
- Let \(\alpha \beta \omega\) be a step of bottom up parse, assume the next reduction is \(X \to \beta\), then \(\omega\) should already have been a string of terminals (since we go rightmost first)
Example
int * int + int | T -> int
int * T + int | T -> int * T
T + int | T -> int
T + T | E -> T
T + E | E -> T + E
E
Weird! It somehow starts in the middle.
Generally, the parsing strategy expands non-terminals bottom-up in the rightmost first.