12.2.1. Casting And Member Variables

Time: 00:07:34 | Download: Large, Large (CC), Small | Streaming (CC), Streaming | Slides (PDF)
Review

Classes typically have both member variables and member functions. Two instances of the same class can have distinct variables, saving different values, but share the class's functions, suggesting that the compiler uses different algorithms to locate member variables and functions. Furthermore, some functions participate in polymorphism while others do not. Therefore, the compiler uses three different algorithms to locate the different members. We continue our incremental steps toward polymorphism by exploring how the compiler locates member variables in an object. Polymorphism requires inheritance, so we also explore its effect on locating member variables, drawing on and extending the previous casting discussion.

Locating By Offset

Programs locate data stored inside an object (an instance of a class or struct) by calculating an offset from the object's address in memory. The idea of locating something by adding an offset to an address may seem daunting at first, but it's something that most of us have done in some way in the past.

Houses along a street used as a metaphor for locating data with an offset calculation. We begin with a base address or location, like a notable house - a pink house in the picture. We can locate another house by saying, 'We're the third house past the pink house.'
Locating a house with an offset. Imagine that you are visiting a new friend but that a tree has grown to cover up the address on the house. So your friend gives you directions: "I live three houses beyond the pink house." The pink house forms a base address to which you add an offset of three houses to find the final destination. The black and orange house we're looking for is offset by three houses from the pink house or base address.

This may seem like a difficult calculation, but it's easy enough that the compiler can automatically generate the code to do it. We provide the base address (the name of a member variable), and the compiler does the arithmetic. The next example shifts from houses on a street to objects in memory.

Locating Member Data

In the street-and-house example above, a house is located by starting with a base address (the pink house at 315 Elm) and then adding an offset (three houses in this example) to the base address. The compiler does the same address arithmetic when it locates member variables inside an object. In this case, the base address is the address of the object itself, and the offset is the number of bytes needed to store the preceding member variables.

C++ Abstract Memory Layout Member Variable Locations In Memory
class Person
{
    private:
	string	name;
	double	height;
	int	weight;
};

Person* student
	= new Person(...);

Person* instructor
	= new Person(...);
Two instances of the Person class. Each instance has three member variables: name, height, and weight. The member variables in one object are separate from the members in the other object.
student;

student + sizeof(name);

student + sizeof(name) + sizeof(height);


instructor;

instructor + sizeof(name);

instructor + sizeof(name) + sizeof(height);
(a)(b)(c)
Calculating offsets: Locating data in a Person object.
  1. The class Person specification; the focus is only on member variables, so the example omits functions from the class (the ellipses). student and instructor are pointers to two Person objects.
  2. Each large, outer rectangle represents an instance of the Person class, each with three distinct member variables.
  3. The calculation for each member variable appears opposite the variable. For example, the address of student's weight is the base address, student, plus the size of name plus the size of height.

Inheritance and Member Variables

The compiler needs several values to calculate the offset amount. It needs to know the variables a class contains and their order in the class, the data type of each variable (from which it derives the variable's size), and the class's superclass (if it has one). The compiler stores this information in its symbol table while compiling a program. When the compiler needs the information for code generation, it searches for the class's name in the symbol table. For example, Figure 2(a) defines two variables: Person* student and Person* instructor. Whenever the compiler generates code for student or instructor, it searches for "Person" in its symbol table to find information about the Person class.

Two classes: Person and Employee, where Employee is a subclass of Person.

Person
----------------------
-name : string
-height : double
-weight : int
----------------------

Employee : public Person
----------------------
-id : int
-phone : string
----------------------
class Person
{
    private:
	string	name;
	double	height;
	int	weight;
};


class Employee : public Person
{
    private:
	int	id;
	string	phone
};
An abstract representation of two entries in a compiler's symbol table. Each entry has the class's name, a pointer to its superclass (nullptr if the class doesn't have a superclass), and information about the class's member functions. In the picture, the program instantiates an Employee object named ceo with a part created from its superclass, Person, and a part created from Employee. The respective classes describe each part. The compiler locates a member by the sizes of the preceding members to the address represented by the name ceo.
(a)(b)(c)
 Employee* ceo = new Employee(...); 
(d)
Accessing member variables in classes related by inheritance.
  1. UML class diagram: an Employee is a Person
  2. C++ source code: in a C++ program, inheritance is denoted by : public Person
  3. Symbol table and memory layout: the symbol table (the rectangle on the left) contains information about each class and its member variables and about the inheritance relationship. An Employee object (the rectangle on the right) consists of a Person embedded in an Employee object
  4. Object instantiation: ceo is an Employee object instantiated on the heap
    • When the compiler accesses member variables, it uses the information stored in the symbol table
    • The compiler searches for the class name data type (highlighted) in the symbol table
    • In this example, ceo is an Employee; so, when calculating the offsets needed to access its member variables, the compiler begins searching with the Employee symbol table entry. Inheritance is unidirectional, from the subclass (Employee) to the superclass (Person); so the Employee symbol table entry to the Person symbol table entry, allowing the compiler to access all of the information needed to locate any member variable in the ceo object.
The key concept is that the compiler locates information based on the class name defining the variable on the left-hand side of the assignment operator.

Inheritance and Casting

Now that we understand that the compiler locates information based on the variable type, we're ready to take the next step. How does the compiler locate information in the case of inheritance and upcasting? To get a better idea of what the question means, compare Figure 3(d) to Figure 4(a).

 Person* ceo = new Employee(...);  When the compiler searches for data members, it begins searching in the part of its symbol table that corresponds with a variable's data type. In the picture, ceo is a Person, so the compiler searches in the Person symbol table entry, where it finds name, height, and weight. Inheritance is a one-way relationship, so the compiler can't navigate to Employee to locate id or phone. To access those members, we must downcast ceo from Person to Employee.
(a)
 Employee* temp = (Employee *)ceo; 
(b)(c)
Downcasting to reach subclass member variables. Programs most often perform an upcast when they call a function, passing a subclass object to a superclass parameter (see Figure 4(b)). Perhaps the most unexpected consequence of upcasting is that it often results in member variables that exist in an object but that are unreachable! Programmers must use a downcast operation to access the "unreachable" variables.
  1. Instantiation and upcast: This statement, modified from Figure 3(d), instantiates an Employee object but upcasts it to a Person. The Employee object has id and phone member variables, but they are unreachable because ceo is a Person pointer.
  2. Downcasting: The only way to reach the member variables specified in the Employee class is to downcast (highlighted in green) ceo to a Person.
  3. Downcasting explained: In this example, ceo is an instance of Person, so calculating the offsets requires access to the ceo class's member variables, the compiler begins at the Person entry in the symbol table, which allows it to access name, height, and weight. But, because inheritance is unidirectional (from subclass to superclass), the compiler is unable to access the Employee symbol table entry, which means that it cannot access id or phone even though they are part of the original Employee object.