EECS 322: Compiler Construction
In this course you will build a compiler for a simple (but illustrative) programming language that takes programs all the way down to running on an x86 processor. It will explain the standard structuring for a compiler with a front end (parsing, type-checking), a middle end (optimization / transformations) and a back end (code generation).
|Lecture||Tech LG52; TTh 12:30-1:50pm|
|Recommended Texts (none are required)||Modern Compiler Implementation in ML by Andrew W. Appel|
Engineering a Compiler by Keith Cooper & Linda Torczon
Packrat parsing. Probably best to start with Brian Ford's master's thesis.
Producing Wrong Data Without Doing Anything Obviously Wrong! by Mytkowicz et al
This paper shows why performance evaluation can be super tricky.
Racket is available online and in /home/software/racket-5.3.3/ on the tlab machines.
The file 322-interps.tar.gz contains an interpreter for each of the various languages you'll be compiling this quarter as well as run-test-fests.
The file runtime.c contains the implementation of our GC and printing routines, as well as a main() function your compiler could use.
|Lc Speed Test|
See the final results of the speed test
Also, the speed-test.tar.gz file contains the programs used.
lecture12.txt CPS conversion & lambda lifting
lecture11.txt Some L5 optimizations
lecture10.txt L5 to L4
lecture08.txt L3 to L2
lecture07.txt tail calls
lecture06.pdf register allocation, iii
lecture05.pdf register allocation, ii
lecture04.pdf intro to register allocation & spilling
lecture03.txt from L1 to x86
Students are encouraged (but not required) to work in pairs. Pair programming is not team programming, however. That is, pairs must promise (in writing) that they will always sit together when working on the assignments, never separately. If this is too much of a burden, work alone.
Presenter(s): When presenting a codewalk:
In general, be prepared to figure out in real time how changes to the data structures and organization of your program would affect the code.
- Concisely restate the task, and which parts you got done (if not all of them)
- Provide an overview of your solution. Depending on what is going on in your code, this should consist of some diagrams: diagrams explaining data structures (class hierarchies, if you used classes or similar data-definition diagrams), and/or diagrams explaining example, common interactions between components in your code (interaction diagrams).
- Present the components in a top-down manner, no matter how you designed and implemented them. Be prepared to defend your code's organization and explain how it matches (or why it fails to match) your organization. When presenting the code, you should, in general, be able to refer to some spot in an earlier diagram to explain the context of how the code is used.
When evaluating a code walk, we look at three things
- The quality of your presentation
- The ability to focus in on specific lines of code in response to a comment
- The ability to think through specific issues brought up by the class or the panel
Panelists: As a panelist, you will have one of three different roles:
The secretary is responsible to supply a copy of the written notes to Robby by 5am the morning after the codewalk. If the notes are acceptable, they will be forwarded to the presenter(s); if not, edits will be requested.
- Manager: the first reader/analyist of the code, with the responsibility to keep the codewalk on track
- Second manager: the second reader, who helps the first reader
- Secretary: keeps notes on the code walk; weakness in the code, questions that came up, etc.
When evaluating the managers, we are looking for the ability to identify solid problems with the code and articulate them well. While this may appear to be dependent on the quality of the code, in practice, all code has issues and places that it could be improved. The secretary is evaluated on the quality of the notes produced.
|Programming Language||Students are free to use any programming language. As a general guideline, I recommend a programming language that is both safe and has garbage collection. These two features make building software easier (and the second often improves performance). Also, you will have to build a simple parser for a parenthesis-based language that comes for free in PLAI, so you may want to just use it.|
Grades in the course are based on passing each of the programming assignnments, the speed test (on the final day of class when Lc is due), and your codewalks for up to 13 opportunities to pass.
To pass one of the programming assignments (1, spill, liveness, graph, 2, 3, 4, or 5), you must either pass 75% of the test cases in the initial test fest, submit a test suite that finds a bug in every (other) submission in the initial test fest, or pass 98% of the test cases in a later test fest.
To pass the speed test (Lc), your compiler must generate a binary that produces the correct output for each of the submitted speed test programs.
The winner of the speed test and anyone that beats racket on all programs gets a free pass to be used on any one assignment. Note that while racket has had 15 years of continuous development that gives it a fair edge over your 10 or so weeks worth of effort, it is at a significant disadvantage because its versions of the primitive operations are more complex and have more error checking. Overall, this should make it a fair fight. (Put another way, getting performance in the face of all the details that go into a full-fledged, safe language is not easy.)
You may resubmit any version of any assignment any time up to the last day of finals and if you do not pass your codewalk, you may request a private codewalk (on the same assignment or a different one).
Your code will be scrutinized for plagiarism and other forms of cheating and, if discovered, you will be punished to furthest extent possible.