A compiler complies code!
parts of compilers
lexical analysis
make stuff tokens — “identifying words”
an example!
if x == y then z = 1; else z = 2;
- if: keyword
- " “: white space
- x: identifier
- ==: relation
… and so on
parsing
abstract syntax treeifying the tokens — “identifying sentences”.
See parser.
semantic analysis
optimization
make the IR faster — “editing”.
goals
- run faster
- use less memory
- generally, conserve resources
tricky tricky!
Can this be optimized to x=0
?
x = y * 0
Its tricky. y
may not be numeric; y
maybe float, then in which can nan * 0 = nan
.
code generation
generate machine code — “translation.” Consider: register layout, etc.
other stuff
intermediate representation
compliers typically translate between multiple intermediate languages.
- all but the first and last representations are called intermediate representations
- IRs are generally ordered in descending levels of abstraction
digraph {
rankdir=LR;
graph [bgcolor=transparent];
node [fontcolor=white, color=white, shape=none];
edge [fontcolor=white, color=white];
ir1 [label="ir"]
source -> ir -> "..." -> ir1 -> {assembly, ir1}
}
issues
many pitfalls:
- your copmiler maybe slow
- may not be able to error nous inputs
- language design is important – determines what is ambiguous (hence what is easy / hard to compile)
- tradeoffs! in language design