12.6. Implementing Polymorphism

Time: 00:05:39 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)
Review

The compiler can establish all non-polymorphic function bindings at compile time. However, binding polymorphic function calls to the correct function requires dynamic data structures that operate and evolve during program execution. Although we can use polymorphism without knowing how C++ implements it, understanding its implementation demystifies some of its requirements and behaviors. Furthermore, polymorphism's wonderfully elegant algorithms are worth studying if only to illustrate the beauty of a well-designed problem solution.

A tall, narrow inheritance hierarchy with class A is at the top. B is a subclass of A; C is a subclass of B. Class D is a subclass of C and is at the bottom. Stylized constructor calls showing the subclass constructors calling the super class constructors from D to C, from C to B, and from B to A. The constructors run from A to B, from B to C, and from C to D. An instance of class D with an instance of class C nested inside. Inside the C object is a B object, and inside the B object is an A object.
(a)(b)(c)
Inheritance and constructors. For every class that has a superclass, the first operation of its constructors is to call one of its superclass's constructors.
  1. A tall inheritance hierarchy
  2. Constructor calls (the red arrows on the right) begin with the instantiated class and continue upward to the top of the hierarchy. Constructor bodies run (the blue arrows pointing downwards) from the top of the hierarchy down to the instantiated class.
  3. An instance of class D, with instances of its superclasses nested inside

Inheritance And Constructor Execution

Polymorphism begins with constructors. Whenever a program instantiates a class, it simultaneously instantiates all its superclasses, nesting the superclass objects inside the subclass objects like Chinese boxes, one inside another. Although nested, they are still objects, and the program initializes or constructs them with constructor functions. The compiler calls default constructors automatically, and programmers explicitly call non-default superclass constructors through a subclass constructor's initializer list. In either case, chained constructor calls occur before the constructors' bodies execute.

For example, suppose that a program creates an instance of class D as illustrated in the UML diagram at the right. The first operation of the class D constructor is calling the class C constructor, which calls the class B constructor, which calls the A constructor. Class A does not have a superclass, so its constructor runs to the end and then returns to the class B constructor. The class B constructor runs to the end and then returns to the class C constructor, which runs and returns to the class D constructor.

This "call upwards and execute downwards" pattern occurs regardless of where the program begins construction in the inheritance hierarchy. For example, if the program creates an object from class C, the calls climb from C through B to A, and execution descends from A, through B, and ends with C. The pattern also holds when a class has multiple constructors with the program calling and executing one.

C++ uses this "call upwards and execute downwards" pattern to initialize some dynamic data structures necessary for polymorphism. The compiler adds the initialization code to the constructors. If a class doesn't have a programmer-defined constructor, the compiler creates a simple default constructor to run the initialization code.

Virtual Tables And Pointers

Programmers activate polymorphism by adding the virtual keyword to a member function. When activated, the compiler automatically generates several complex data structures. First, it creates a table of function pointers called the virtual function table (or vtable or vtab for short). The compiler creates a vtable for every polymorphic class (i.e., for every class with one or more virtual functions). Second, the compiler adds a pointer, called the vpointer (or vptr for short), to every object instantiated from a polymorphic class. Together, these structures allow a program to dynamically bind a function call to the correct function body, not at compile-time but at runtime.

When the compiler translates functions to machine code, it saves the memory address of each function in its symbol table. Whenever the compiler processes a non-polymorphic function call, it binds the call to the saved function address (i.e., it generates a jump to its address). However, in the case of a polymorphic function call, there are two or more overridden functions that can match the call. The compiler cannot determine which function it should bind the call to, so it saves the addresses of all potential matches in the vtables of each class.

The vtables are organized by class, one table for each polymorphic class. Each vtable consists of a list of function pointers, one pointer for each virtual function in the class. Whenever a program instantiates an object from a polymorphic class, the compiler adds a "hidden" pointer variable, the vpointer, to the object. The program follows the vpointer to a class's vtable, searches the table for the corresponding function, and runs the function. But how does the compiler get the vpointer to point to the correct vtable?

Constructors are responsible for initializing the vpointers. As described above, chained constructor calls climb to the top of the inheritance hierarchy and then execute downward, ending with the constructor for the instantiated class. As it runs, each constructor sets the object's vpointer to point to the vtable of the corresponding class. For example, if a program creates an instance of class D, the class A constructor runs first and sets the vpointer to the address of the A vtable. The B constructor runs next and overwrites the address previously saved in vpointer with the address of the B vtable. The C constructor does the same but with the address of the C vtable. Finally, the D constructor is the last one to run, and it sets the object's vpointer to its final value, the address of the D vtable. Although the repeated assignments are a little wasteful, they are fast integer operations that have little impact on the program's run time.

Returning to the shape classes (Figure 2), if we instantiate a Circle, the Circle constructor calls the Shape constructor. Shape is at the top of the inheritance hierarchy, so the Shape constructor sets the object's vpointer to point to the Shape vtable (Figure 3), but then Circle's constructor runs and updates the vpointer to point to the Circle's vtable. The same sequence holds for objects instantiated for Rectangle and Triangle.

Four shape classes.
Shape
----------------------
----------------------
+draw() : void
+erase() : void

Circle
----------------------
----------------------
+draw() : void
+erase() : void

Rectangle
----------------------
----------------------
+draw() : void
+erase() : void

Triangle
----------------------
----------------------
+draw() : void
+erase() : void
The Shape inheritance hierarchy. The draw and erase functions in each subclass override the functions in the Shape class. The UML does not provide a notation for designating a function as virtual. Making them virtual for this example provides three polymorphism requirements: inheritance, function overrides, and virtual functions. Note that the operation of the vpointers and vtables is the same for concrete and abstract classes and functions.
Four shape objects, instances of Shape, Circle, Rectangle, and Triangle, respectively, represented as rectangles. Each object has a 'hidden' pointer named vptr. The vptr in each object points to the vtable associated with each class. Each vtable has two pointers that point to the virtual functions draw and erase.
Implementing polymorphism. Every polymorphic class has a table of pointers called a vtable that points to the class's virtual functions. Every polymorphic object contains a pointer called a vptr that points to its class's vtable.

Polymorphic Function Binding

The vpointer and the vtable are only used to locate virtual functions. When all the prerequisites for polymorphism are in place, and a program calls a virtual function, it locates and dispatches or calls the correct function with a well-defined sequence of steps.

Shape* s = new Circle;
	.
	.
	.
c->draw();
Dynamic dispatch: calling a polymorphic function. The first statement instantiates a Circle object and upcasts it to a Shape named c. The Shape hierarchy described in Figure 2 satisfies three polymorphic requirements, and the instantiation statement satisfies the final two requirements: a pointer variable and an upcast. Therefore, the program processes the draw function call polymorphically:
  1. The program retrieves the address stored in c's vpointer
  2. It follows the address to the Circle vtable
  3. It searches the Circle vtable for a function named draw
  4. The program dispatches or runs the Circle draw function
The program dynamically determines the correct function to call at the time of the call rather than at compile time.

Although undeniably elegant, polymorphism does entail some overhead:

You might think polymorphism's overhead would make it undesirable. But the space requirements are relatively small, pointer operations are typically fast, and the number of virtual functions is usually small enough that the search time is concomitantly low. Furthermore, any non-polymorphic solution will likely have a similar overhead while sacrificing the elegance and the automatic, compiler-generated data structures and operations that polymorphism affords.