This is the 3rd post in the series of assembly code generation.
Part - I
Part - II
In this post, I will try to describe how an object oriented language might implement the "new" operator for creating new instances of some class.
I will describe here how the COOL compiler that I wrote during the compiler course worked.
Some trivia about inheritance in COOL language: A class Foo can inherit from another class Bar, all the global attributes present in Bar are also visible in Foo and can not be redefined. However, methods defined in Bar, also visible in Foo, can be overwritten in Foo.
New object creation works by creating clone of a "prototype object" for that type. COOL compiler generated code lays out one prototype object for each type defined in the program being compiled(which may contain multiple class definitions) and at runtime new objects for a class are created by cloning the prototype object for that class.
Here is how a COOL object is laid out in the memory..
----------------------- offset = 0
class-tag
----------------------- offset = 4
total-object-size
----------------------- offset = 8
ptr-to-dispatch-table
----------------------- offset = 12
inherited-attr-1
-----------------------
inherited-attr-2
-----------------------
..
-----------------------
attribute-1
-----------------------
attribute-2
-----------------------
..
-----------------------
1st 4 bytes contain a unique integer tag assigned to this class by the compiler. This identifies the class of the object and used in various places, for example while checking object equality or to see if two objects are of same type or not.
Next 4 bytes contain the total size of this object.
Next 4 bytes contain a pointer to the dispatch table(a table of methods defined in this class and its parents, we will talk more about this in a separate post)
Remaining bytes contain all the attributes, including inherited. Let say the class hierarchy is..
B inherits A
C inherits B,
Basically A is on top in the hierarchy and C is in the bottom. Then C's prototype object will first contain all the attributes of A, then those of B and then those defined in C itself.
Here are some of the things you should notice about this object layout.
- Object is laid out in contiguous memory.
- The offset of an attribute remains same in a class and all of its subclasses because of the way we order attributes in the object.
This kind of layout basically lets the subclass extend the layout of base class than changing it fundamentally. Because of this fact it is simple to retrieve value of an attribute from a fixed offset in the object without knowing actual concrete dynamic type of the object at runtime.
Let us see the prototype objects generated for a simple hierarchy in COOL.
class A inherits Object {
x:Int;
};
class B inherits A {
y:Bool;
};
Code generated will look something like following...
Object_protObj: #label to refer to this object
.word 0 #write word 0, class tag of Object class, to memory
.word 3 #write size of this object in memory words
.word Object_dispTab #write address of dispatch table of Object type in 1 word of memory
A_protObj:
.word 1
.word 4
.word A_dispTab
.word int_const0 #default value of x, pointer to integer object 0
B_protObj:
.word 2
.word 5
.word B_dispTab
.word int_const0 #default value of x, pointer to integer object 0
.word bool_const0 #default value of y, pointer to boolean object false
Now we are ready to understand the code generated for expression, new A.
cgen(new A)
#load address of label A_protObj in register \$a0
la \$t1 A_protObj
#emit instructions to copy the
#object at address \$t1, and to put
#address to newly cloned object in \$a0
And you're done, new object of type A is created and pointer to it is placed in \$a0 :).
BONUS Reading:
COOL supports
SELF TYPEs too.
Basically when we execute x.someMethod(..), Inside the definition of someMethod, there is a variable visible(called "self" in COOL and "this" in java) that refers to the object referenced by x at runtime. new SELFTYPE is supposed to return the object of the type of the one referenced by "self".
At this point, I should declare one more invariant. Code generated for method call/dispatch always ensures that register \$s0 has the pointer to "self" object.
To support, new SELFTYPE, COOL Compiler always generates a label called Class_objTab that has pointers to all the prototype objects in order of their class tags. So, for our example mentioned above, that label will look something like following..
class_objTab: #the class_objTab label
.word Object_protObj #first pointer to Object_protObj, as its class tag is 0
.word A_protObj #next is A as its class tag is 1
.word B_protObj #next is B as its class tag is 2
Now we are ready to see the code generated for, new SELFTYPE
cgen(new SELFTYPE)
#load the address of label class_objTab in
#temporary register \$t1
la \$t1 class_objTab
#load the integer class-tag of self object
#referenced by \$s0, class-tag is stored at
#offset 0
#class-tag is stored in temp register \$t2
lw \$t2 0(\$s0)
#class_objTab contains the prototype object pointer
#at offset class-tag of the type, so we add class-tag
#stored in \$t2 to \$t1 get address of prototype object
#of self object
addu \$t1 \$t1 \$t2
#at this point \$t1 has the pointer to prototype object
#of self object's type
#we can emit code that copies the prototype object and
#puts the pointer in \$a0