Friday, July 6, 2012

Assembly Code Generation - part#1

I recently finished doing the Compiler course from Coursera. In this series of posts I am trying to take some notes explaining how high level programming language expressions might be translated to assembly code. They are not accurate descriptions but more to get the idea.

In general, on any machine, you have a few registers and memory allocated to your process. Initially, execution of a program is under the control of the OS. When the process is started, OS allocates memory to the process and jumps to the entry point(e.g. "main"). Memory(from here on, memory means the memory allocated to your process) is typically divided in 2 sections, namely code and data area. Your instructions are stored in code area while the data is stored in the data area. It is responsibility of the compiler to generate code and orchestrate data area.

Usually, assembly code is generated by recursively traversing the AST. Assume every node in the AST implements a function called cgen() that is responsible for generating assembly code for the expression for that node in the AST.

In this post I will explain how a "+" expression might be implemented.

In general, when we generate code, we need to have some invariants so that various expression's generated code doesn't modify the memory or registers in an unexpected way.

For this post, following invariants are sufficient.

1. Any expression's generated assembly code puts the pointer to the result of the expression in accumlulator register, \\$a0.
2. Register \\$sp (stack pointer) always contains address of next free word where data can be written.
3. Any expression't generated code ensures that it preserves \\$sp value that is at the end of expression evaluation code \\$sp gets the same value back that it had when that expression's code execution started.

cgen(e1 + e2):

#generate code to evaluate e1, this should
#put the result of e1 evaluation in \\$a0
cgen(e1)

#now that \\$a0 has value of e1 evaluation, let us
#store it on the stack
sw \\$a0 0(\\$sp)
addiu \\$sp \\$sp -4 #decreasing stack pointer as we "pushed" 4 bytes of data

#now generate code to evaluate e2, that should
#put the result of e2 evaluation in \\$a0
cgen(e2)

#load the value of e1 evaluation in a temporary register \\$t1
lw \\$t1 4(\\$sp)
addiu \\$sp \\$sp 4  #increasing stack pointer as we "popped" 4 bytes of data
add \\$a0 \\$a0 \\$t1  #add the values in \\$a0 and \\$t1, put the result in \\$a0

#finally at this point \\$a0 holds the sum of results of e1 and e2 as required