Its optimization time!!
Program Optimization
“maximum benefit for minimal cost”
When should optimization happen?
AST?
- pro: machine independent
- con: no notion registers, not language dependent
Assembly?
- pro: many optimization opportunities
- con: machine dependent
And so, we really should be doing this with some IR.
Three-address intermediate code
Each instruction is of the form:
x := y op z
x := op z
where x
, z
are registers, constants, etc.
basic block
A basic block is a sequence with no labels (except in the first point) an no jumps (except in the last point). So when we are inside a basic block, we can optimize the code inside sans worry about control flow.
control-flow graph
a control-flow graph is a directed graph with….
- basic blocks as nodes
- as edge from \(A\) to \(B\) if an execution can pass from the last instruction in \(A\) to first instruction in \(B\)
Three Levels of Optimizations
- local optimizations: apply to a basic block in isolation
- global optimizations: apply to a control-flow graph
- inter-procedural optimizations: optimizations between graphs, such as inclining
some stuff you should run
- basic block
- dataflow
- loop
- instruction optimization / peephole
- register
some special oop stuff
- function inclining
- method call targets
- class unboxing
some functional stuff
- tail recursion
- deforestation
basic block optimizations
algebraic simplification
x := x + 0 => x := x
x := x * 1 => x := x
y := y ** 2 => y := y*y
x := x * 8 => x := x << 3
on --ffast-math
-- only works on ints (since nan * 0 = nan)
x := x * 0 => x := 0
constant folding
x := y op z
with constant y
and z
, we can just fold it. so
x := 2+2 => x := 4
static single-assignment form
rewrite intermediate code in SSA form <= never reassign variables. Meaning, if two assignments end up with the same rhs, they compute the same value. Namely:
So in SSA,
x := y + z
...
w := y + z
we know that x
and w
can’t be redefined in SSA; this means that we cloud write:
x := y + z
...
w := x
copy propagation
so we can keep writing using static single-assignment form replacements. In particular note that once we are just assigning a variable to another, the first variable can just be replaced with the second everywhere.
b := z + y
a := b
x := 2 * a
we can stick b into a
b := z + y
a := b
x := 2 * b
and then we see that `a := b` is deadcode.
b := z + y
x := 2 * b
dead code elimination
within a basic block, only one type of dead code
Suppose “w := rhs” appears in a basic block, but “w” doesn’t appear anywhere else in the program. We can say “w := rhs” is dead and can be eliminated.
peephole optimization
a “peephole” is a sequence of contiguous instructions; the optimizer replaces the sequence with another equivalent one, but faster.
global optimizations
How do we apply local optimizations to a global scope?
Generally, to prove postulate P
at any point requires knowing the entire function; so we either know X
is definitely true, or we say “we don’t know.”
example
Consider the branching order:
(X := 3, B > 0) => {(Y := Z + W), (Y := 0)} => (A := 2*x)
in this control flow graph, we never touch x
in the middle, therefore we can just propagate x
forward by replace it.
program point
every statement is associated with 2 program points:
- right before a statement
- right after a statement
global constant propagation
Let’s consider three states x
can be:
value | interpretation |
---|---|
\(\top\) | not constant |
\(c\) | constant, on every path |
\(\bot\) | unreachable code |
so by the time you hit a program point, if \(x=c\), then its constant. To perform this labeling, we must…
transfer function
for “in”(to a statement) and “out” (of a statement), we have, slides 21 and more https://web.stanford.edu/class/cs143/lectures/lecture15.pdf.
an algorithm
For every entry \(s_{0}\) to the program, we set:
\begin{equation} C\qty(s_{0},x,\text{in}) = \top \end{equation}
and set:
\begin{equation} C\qty(s \neq s_{0},x,\text{in}) = C\qty(s \neq s_{0},x,\text{out}) = \bot \end{equation}
everywhere else. An then we just apply the rules above.
convergence
This system converges because the rules have it such that \(\bot < c < \top\) (once something is top, it can’t go back.) So, \(\top\) is the “greatest” value, and \(\bot\) the “least”.
We can then lub the rules 1-4 from above into on rule:
\begin{equation} C\qty(s,x,\text{in}) = \text{lub} \qty {C\qty(p,x,\text{out}) | p\text{ predecessor } s} \end{equation}
So you can just see the lub of predecessor hierarchy of each statement.
dead code elimination
a variable \(x\) is “live” at statement \(s\) if…
- there’s a statement \(s’\) that uses \(x\)
- there’s a path from \(s\) to \(s’\)
- that path has no intervening assignment to \(x\)