Thursday, July 5, 2012

Finished Coursera Compiler course

Today I gave the final exam for compiler course and managed to get a decent score.

Course website says, "This course will discuss the major ideas used today in the implementation of programming language compilers, including lexical analysis, parsing, syntax-directed translation, abstract syntax trees, types and type checking, intermediate languages, dataflow analysis, program optimization, code generation, and runtime systems". And, the course was really able to cover the said topics in decent detail.

But, real learning of the course was not in the videos or quizzes or exams. It was in the optional project, where you were expected to code 4 major phases of a compiler namely lexical analysis, parsing, semantic analysis and assembly code generation. So, essentially, I wrote a fully functional compiler for COOL language that has many features found in Java and C++.

Programming assignment#1 was to write the lexical analyzer for COOL, which will take a COOL program and produce the Tokens. We were expected to use lexer generator, JLex, where you can give your token specifications using regexps and java code that creates required token on regex match. Though lectures taught you how actually regexes can be implemented by converting them to NFAs, which can be simulated directly or can be converted to DFAs and can have a table driven implementation, but use of JLex did not let me code that. Because of time pressure I could only do the assignment with JLex. I would like it even more if assignment is modeled in a way so that you don't use JLex but code the regex implementation by hand.

Programming assignment#2 was to write the parser that would generate the AST from the tokens returned by lexer from previous assignment. Again, we used a parser generator called Java CUP, that would generate the parser given the CFG grammar specification with associativity and precedence rules.Though you learn the theory of parsing(from lecture videos and readings), but assignment does not really need you to hand code one. I would like it even more if I hand code the parser as this is an academic exercise(however, I have earlier written a recursive-descent parser).

The fun part begins now..

Programming assignment#3 is when you have no tools. You have an AST and need to hand code all the semantic rules of the language. Most of the code you write here is to do static type checking. You traverse the whole AST, I used visitor pattern, at every node(which are different kind of expressions supported by the language, for example method dispatch, if expression, case expression, assignment etc). You ensure that types declared and inferred from the expression are in agreement with the type rules defined in the language. COOL supports SELF TYPES and that adds to the type checker complexity.

Programming assignment#4 gets even better. After you're done with compile time semantic rules check, you generate MIPS assembly code for the given COOL program. This is the place where you become one with the machine. Here you have to carefully convert all the high level expressions to assembly code. You never realize while coding in a high level language that when it comes to assembly then processor, mostly, only knows how to go about blindly executing the instruction sequence laid out in memory and occasionally make jumps to other memory addresses. It is just an eye opener to learn the tricks of translating high level language constructs to simple load, move, jump and branch instructions with few registers and memory.

While doing these assignment, I had to spend a lot of time with paper and pen to think hard and form a solution.. for example how and what should go in a stack frame when a method is called, how you locate its correct implementation and jump back to the return address after completion.
Another eye opener was that, once the assembly code is generated, then there are no "identifiers", all you have is the memory addresses in the form of labels and offsets to locations in your stack frame :).

Another side thing I noticed is that It is an essential skill to draw yourself away from the laptop, think in isolation and be able to write code on paper in a not really compilable version of your choice of implementation language.

Besides keeping me super busy outside day job, This compiler project has enlightened me. And, I am really really thankful to coursera. Having done this compiler for a statically typed language(and a interpreter for a lisp language, Scheme), after so many years of coding... Now, I feel like, I can "see through" the languages. And, The feeling is just amazing.

No comments:

Post a Comment