12.2.1. Casting And Member Variables

Time: 00:07:34 | Download: Large, Large (CC), Small | Streaming (CC), Streaming | Slides (PDF)
Review

Every class can have member variables and member functions. We can further classify member functions as non-polymorphic or "regular" functions and polymorphic. The compiler uses different algorithms to locate member variables and non-polymorphic and polymorphic functions. Polymorphism is nothing more than another way of binding a function to a function call. We take incremental steps toward understanding polymorphism, beginning by understanding how the compiler locates member variables in an object. Understanding how the compiler finds member variables will help us understand some casting operations that appear in object-oriented programs and why they are needed.

Locating By Offset

Programs locate data stored inside an object (an instance of a class or struct) by calculating an offset from the object's address in memory. The idea of locating something by adding an offset to an address might seem a bit daunting at first, but it's something that most of us have done in some way in the past.

Houses along a street used as a metaphor for locating data with an offset calculation. We begin with a base address or location, like a notable house - a pink house in the picture. We can locate another house by saying, 'We're the third house past the pink house.'
Locating a house with an offset. Imagine that you are visiting a new friend but that a tree has grown to cover up the address on the house. So your friend gives you directions: "I live three houses beyond the pink house." The pink house forms a base address to which you add an offset of three houses to find the final destination. The black and orange house, the one we're looking for, is offset by three houses from the pink house or base address.

This may seem like a difficult calculation, but it's quite easy. It is so easy that the compiler can automatically generate the code to do the calculation. We need to provide the base address (the name of a member variable), and the compiler does the calculation. Let's take the example one more step and see what the compiler must do to locate member variables in an object stored in memory.

Locating Member Data

In the street-and-house example above, a house is located by starting with a base address (315 Elm or the pink house) and then adding an offset (three houses in this example) to the base address. The compiler does the same address arithmetic when it locates member variables inside an object. In this case, the base address is the address of the object itself, and the offset is the number of bytes needed to store the preceding member variables.

C++ Abstract Memory Layout Member Variable Locations In Memory
class Person
{
    private:
	string	name;
	double	height;
	int	weight;
};

Person* student
	= new Person(...);

Person* instructor
	= new Person(...);
Two instances of the Person class. Each instance has three member variables: name, height, and weight. The member variables in one object are separate from the memebers in the other object.
student;

student + sizeof(name);

student + sizeof(name) + sizeof(height);


instructor;

instructor + sizeof(name);

instructor + sizeof(name) + sizeof(height);
(a)(b)(c)
Calculating offsets: Locating data in a Person object.
  1. The class Person specification; the focus is only on member variables, so functions are not included in the class. (The ellipses denote detail elided for simplicity.) student and instructor are pointers to two Person objects.
  2. Each large, outer rectangle represents an instance of the Person class, each with three distinct member variables.
  3. The calculation for each member variable appears opposite the variable. For example, the address of student's weight is the base address, student, plus the size of name plus the size of height.

Inheritance and Member Variables

The compiler needs to "know" several values to calculate the offset amount. It needs to know the variables a class contains and their order in the class, the data type of each variable (from which it derives the variable's size), and the class's superclass (if it has one). The compiler stores this information in its symbol table while compiling a program. When the compiler needs the information for code generation, it searches for the class's name in the symbol table. For example, Figure 2(a) defines two variables: Person* student and Person* instructor. Whenever the compiler generates code for student or instructor, it searches for "Person" in its symbol table to find information about the Person class.

Two classes: Person and Employee, where Employee is a subclass of Person.

Person
----------------------
-name : string
-height : double
-weight : int
----------------------

Employee : public Person
----------------------
-id : int
-phone : string
----------------------
class Person
{
    private:
	string	name;
	double	height;
	int	weight;
};


class Employee : public Person
{
    private:
	int	id;
	string	phone
};
An abstract representation of two entries in a compiler's symbol table. Each entry has the the class's name, a pointer to it's superclass (nullptr if the class doesn't have a superclass), and information about the class's member functions. In the pictrue, the program instantiates an Employee object named 'ceo' has a part created from its superclass, Person, and a part created from Employee. Each part is describe by its respective class. The compiler locates a meber by the sizes of t he preceeding members to the address represented by the name 'ceo.'
(a)(b)(c)
 Employee* ceo = new Employee(...); 
(d)
Accessing member variables in classes related by inheritance.
  1. UML class diagram: an Employee is a Person
  2. C++ source code: in a C++ program, inheritance is denoted by : public Person
  3. Symbol table and memory layout: the symbol table (left-hand rectangles) contain information about each class' member variables and about any inheritance relationships. An Employee object (right-hand rectangle) consists of a Person embedded in an Employee object
  4. Object instantiation: ceo is an Employee object instantiated on the heap
    • When the compiler accesses member variables, it uses the information stored in the symbol table
    • The compiler searches for the class name data type (highlighted) in the symbol table
    • In this example, ceo is an Employee; so, when calculating the offsets needed to access the ceo's member variables, the compiler begins searching with the Employee symbol table entry. Inheritance is unidirectional, from the subclass (Employee) to the superclass (Person); so the Employee symbol table entry to the Person symbol table entry, allowing the compiler to access all of the information needed to locate any member variable in the ceo object.
The key concept is that the compiler locates information based on the class name defining the variable on the left-hand side of the assignment operator.

Inheritance and Casting

Now that we understand that the compiler locates information based on the variable type, we're ready to take the next step. How does the compiler locate information in the case of inheritance and upcasting? To get a better idea of what the question means, compare Figure 3(d) to Figure 4(a).

 Person* ceo = new Employee(...);  When the compiler searches for data members, it begins searching in the part of its symbol table that corresponds with a variable's data type. In the picture, 'ceo' is a Person, so the compiler searches in the Person symbol table entry, where it finds name, height, and weight. Inheritance is a one-way relationship, so the compiler can't navigate to Employee to locate id or phone. To access thoses members, we must downcast the 'ceo' from Person to Employee.
(a)
 Employee* temp = (Employee *)ceo; 
(b)(c)
Downcasting to reach subclass member variables. Upcasting frequently takes place when a program calls a function (see Figure 4(b)). Perhaps the most unexpected consequence of upcasting is that it often results in member variables that exist in an object but that are unreachable! Programmers must use a downcast operation to access the "unreachable" variables.