5.3. Structures

Time: 00:04:45 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)

Review

Programmers often use the term "structure" to name two related concepts. They form structure specifications with the struct keyword and create structure objects with a variable definition or the new operator. We typically call both specifications and objects structures, relying on context to clarify the the specific usage.

Structures are an aggregate data type whose contained data items are called fields or members. Once specified, the structure name becomes a new data type or type specifier in the program. After the program creates or instantiates a structure object, it accesses the fields with one of the selection operators. Referring to the basket analogy suggested previously, a structure object is a basket, and the fields or members are the items in the basket. Although structures are more complex than the fundamental types, programmers use the same pattern when defining variables of either type.

int counter;

student s1;

Defining structure variables. Both statements define a variable: int and student are both type specifiers, and counter and s1 are variables. However, when we create data from structures (and later from classes), we call that data an object. So, we can also say that s1 is an object. The following sections add detail and illustrations to these introductory concepts.

Structure Specifications: Fields And Members

A program must specify or declare a structure before using it. The specification determines the number, the type, the name, and the order of the data items contained in the structure.

struct student
{
	int	id;
	string	name;
	double	gpa;
};

A structure Specification. The struct keyword introduces a new structure specification, which includes the declarations of the structure fields or members (e.g., id, name, or gpa). Fields can be any valid data type; the only exception is that a field's type can't be the same as the surrounding struct. That isn't as complicated as it might sound: it means that you can't nest a structure inside itself. So, for example, none of the fields in student can be a student. We won't formally look at strings (the name field) for a couple of chapters, so don't worry too much about them now, but strings in C++ are similar to those used in Java.

Structure Definitions And Objects

We can use a blueprint as another analogy for a structure. Blueprints show essential details like the number and size of bedrooms, the location of the kitchen, etc. Similarly, structure specifications show the structure fields' number, type, name, and order. Nevertheless, a blueprint isn't a house; you can't live in it. However, given the blueprint, we can make as many identical houses as we wish, and while each one looks like the others, it has a different, distinct set of occupants and a distinct street address. Similarly, each object created from a structure specification looks like the others, but each can store distinct values and has a distinct memory address.

Three student structure objects represented as large rectangles corresponding to code (b). Each rectangle contains three smaller rectangles representing the three structure fields: id, name, and gpa. The example does not initialize the fields. — **Defining structure variables and objects**. After a program specifies a structure, it may use it to create any number of structure objects. In the same way that two integer variables can store different values, the fields in each object can store different values. The six definitions illustrated in (a) and (b) allocate enough memory to contain six `student` objects; each object has enough memory to hold three fields (one `int`, one `string`, and one `double`), but does not initialize any of the fields.

C++ code that defines three new `student` structure variables or objects named `s1`, `s2`, and `s3`. The program allocates the memory for the three objects on the stack.

The pointer variables `s4`, `s5`, and `s6` are defined and allocated automatically on the stack. However, the structure objects they point to are allocated dynamically on the heap.

An abstract representation of three `student` objects on the stack. Each object has three fields.

An abstract representation of three `student` pointer variables pointing to three `student` objects allocated dynamically on the heap with the `new` operator.

Three student structure objects represented as large rectangles. Each rectangle contains three smaller rectangles representing the three structure fields: id, name, and gpa. The example does not initialize any of the fields. This picture corresponds to code (c) and adds three small rectangles labeled s4, s5, and s6. Each labeled rectangle has an arrow pointing to one of the large rectangles. — **Defining structure variables and objects**. After a program specifies a structure, it may use it to create any number of structure objects. In the same way that two integer variables can store different values, the fields in each object can store different values. The six definitions illustrated in (a) and (b) allocate enough memory to contain six `student` objects; each object has enough memory to hold three fields (one `int`, one `string`, and one `double`), but does not initialize any of the fields.

C++ code that defines three new `student` structure variables or objects named `s1`, `s2`, and `s3`. The program allocates the memory for the three objects on the stack.

The pointer variables `s4`, `s5`, and `s6` are defined and allocated automatically on the stack. However, the structure objects they point to are allocated dynamically on the heap.

An abstract representation of three `student` objects on the stack. Each object has three fields.

An abstract representation of three `student` pointer variables pointing to three `student` objects allocated dynamically on the heap with the `new` operator.

Automatic Definition	Dynamic Allocation
student s1; student s2; student s3;	student* s4 = new student; student* s5 = new student; student* s6 = new student;
(a)	(b)

(c)

(d)

Initializing Structures And Fields

Figure 3 demonstrates that we are not required to initialize structure objects when we create them - perhaps the program will read the data from the console or a data file. However, we can initialize structures when convenient. C++ inherited an aggregate field initialization syntax from the C programming language, and the 2011 ANSI standard extends that syntax, making it conform to other major programming languages. More recently, the ANSI 2020 standard added a variety of default field initializations. The following figures illustrate a few of these options.

Three structure objects represented as three rectangles. Each rectangle contains three smaller rectangles depicting the structure fields. The smaller rectangle contains the values from the initialization statements above. For example, s1.id = 123, s1.name = dilbert, and s1.gpa = 3.0. — **Aggregate initialization**. Aggregate initialization (aka list initialization) works well if you know the appropriate values at compile time but not for data input or calculated while the program runs.

The field order within the structure and the data order within the initialization list are significant and must match. The program saves the first list value in the first structure field until it saves the last list value in the last structure field. If there are fewer initializers (the elements inside the braces) than there are fields, the excess fields are initialized to their zero-equivalents. Having more initializers than fields is a compile-time error.

Similar to (a), but newer standards no longer require the `=` operator.

Historically, *en block* structure initialization was only allowed in the structure definition, but the ANSI 2015 standard removed that restriction, allowing *en bloc* assignment to previously defined variables.

Abstract representations of the structure objects or variables created and initialized in (a), (b), and (c). Each object has three fields. Although every `student` must have these fields, the values stored in them may be different.

Another way to conceptualize structures and variables is as the rows and columns of a simple database table. Each column represents one field, and each row represents one structure object. The database schema fixes the number of columns, but the program can add any number of rows to the table.

struct student
{
	int	id = 789;
	string	name = "wally";
	double	gpa = 2.0;
};

Default field initialization. The syntax illustrated here is similar to the instance field initialization supported by Java and other languages. However, C++ only recently added support for this notation.

student s3 = { .id = 789, .name = "wally", .gpa = 2.0 };	student s3 = { .id = 789, .gpa = 2.0 };
(a)	(b)
student s3; ... s3 = { .id = 789, .name = "wally", .gpa = 2.0 };	student s3; ... s3 = { .id = 789, .gpa = 2.0 };
(c)

Designated initializers. The ANSI standard added designated initializers with the 2020 standard. The revised syntax forms a designator with the dot operator, ., and a structure field name. Programmers can create an aggregate initializer list with a comma-separated list of designators enclosed in braces in the structure field order, possibly skipping some fields.

Initializing a structure object when defining it. The = operator is now optional.
Designated initializers are more flexible than the non-designated initializers illustrated in Figure 4, which are matched to structure fields by position (first initializer to the first field, second initializer to the second field, etc.). Designated initializers may skip a field - the program does not initialize name in this example.
Programmers may save data to previously defined structure objects, possibly skipping some fields, with designators.

The next step is accessing the individual fields within a structure object.

Field Selection: The Dot And Arrow Operators

A basket is potentially beneficial, but so far, we have only seen how to put items into it, and then only en masse. To achieve its full potential, we need some way to put items into the basket one at a time, and there must be some way to retrieve them. C++ provides two selection operators that join a specific structure object with a named field. The combination of an object and a field form a variable whose type matches the field type in the structure specification.

The Dot Operator

The C++ dot operator looks and behaves just like Java's dot operator and performs the same tasks in both languages.

object . field

(a)

`s1.id = 123;`	l-value
`cout << s2.name << endl;`	r-value
`int min_gpa = s3.gpa;`	r-value

(b)

	id	name	gpa
s1	123	dilbert	3.0
s2	456	alice	4.0
s3	789	wally	2.0

(c)

The dot operator. Although expressions formed with the dot operator might look complex, dot just forms an extended variable name. As described in Chapter 1, a variable has both a content and an address. Depending on where a program uses the name, it can represent either the variable's address (an l-value) or the variable's content (an r-value).

The general dot operator syntax. The left-hand operand is a structure object, and the right-hand operator is the name of one of its fields.
Examples illustrating the dot operator based on the student examples in Figure 3(a).
The dot operator's left operand names a specific basket or object, and the right-hand operand names the field. If we view the structure as a table, the left-hand operand selects the row, and the right-hand operand selects the column. So, we can visualize s2.name as the intersection of a row and column.

The Arrow Operator

Java only has one way of creating objects and only needs one operator to access its fields. C++ can create objects in two ways (see Figure 3 above) and needs an additional field access operator, the arrow: ->.

object -> field

(a)

`s4->id = 123;`	l-value
`cout << s5->name << endl;`	r-value
`int min_gpa = s6->gpa;`	r-value

(b)

The dot operator. The arrow operator consists of two adjacent characters: the dash or minus and the greater-than or right angle bracket. The operator's unfamiliarity may enhance its complex appearance, but it behaves like the dot operator, forming an extended variable name.

The general arrow operator syntax. The left-hand operand is a pointer to a structure object, and the right-hand operator is the name of one of the object's fields.
Examples illustrate the arrow operator based on the student examples in Figure 3(b).

Choosing the correct selection operator

A picture illustrating the order (left to right) of a structure variable name, the location of the selection operator, and a structure field name.

Locate where a selection operator is needed. Both selection operators are binary - they require a left- and right-hand operand. The left-hand operand is a structure (either an object or a pointer to an object), and the right-hand operand is a structure field name.

Choose the arrow operator if the left-hand operand is a pointer.
Otherwise, choose the dot operator.

Moving Structures Within A Program

A basket is a convenient way to carry and move many items by holding its handle. Similarly, a structure object is convenient for a program to "hold" and move many data items with a single (variable) name. Specifically, a program can assign one structure to another (as long as they are instances of the same structure specification), pass them as arguments to functions, and return them from functions as the function's return value.

Assignment

Assignment is a fundamental operation regardless of the kind of data involved. The C++ code to assign one structure to another is simple and should look familiar:

A new structure object, s4, with empty, uninitialized fields. — **The assignment operator works with structures**. Used with structure operands, the assignment operator copies a block of bits from one memory address (the address of the right-hand object) to another (the address of the left-hand object).

C++ code defining a new structure object and copying an existing object to it by assignment.

An abstract representation of a new, empty structure.

An abstract representation of how a program copies the contents of an existing structure to another structure.

The assignment operator copies the contents (three fields) of the old structure into the new structure. — **The assignment operator works with structures**. Used with structure operands, the assignment operator copies a block of bits from one memory address (the address of the right-hand object) to another (the address of the left-hand object).

C++ code defining a new structure object and copying an existing object to it by assignment.

An abstract representation of a new, empty structure.

An abstract representation of how a program copies the contents of an existing structure to another structure.

Function Argument

Once the program creates a structure object, passing it to a function looks and behaves like passing a fundamental-type variable. If we view the function argument as a new, empty structure object, we see that argument passing copies one structure object to another, behaving like an assignment operation.

Function Return Value

Just as a program can pass a structure into a function as an argument, so it can return a structure as the function's return value, as demonstrated by the following example:

An illustration depicting a function returning a structure. The function read defines a local structure object named temp - illustrated as a rectangle containing smaller rectangles to depict the structure fields. The return statement at the end of the function copies temp to variable s3. — **Returning a structure object from a function**. A function can only return a single value with the `return` operator. But it can return many values by "wrapping" them in a structure object.

C++ code defining a function named `read` that returns a structure. The program defines and initializes the structure object, `temp`, in local or function scope with a series of console read operations and returns it as the function return value. The assignment operator saves the returned structure in `s3`.

A graphical representation of the return process. Returning an object with the `return` operator copies the local object, `temp`, to the calling scope, where the program saves it in `s1`.