Real-world programs are often large, ranging from thousands to millions of lines of code, making managing them in a single file impractical. Furthermore, maintaining a large program in a single file forces programmers to duplicate code common to multiple programs (user interface code in office suites, for example). Our previous programs have been small and only relied on simple, fundamental data types like int, char, and double, which the compiler "knows" about intrinsically, making them appropriate for a single-file implementation. In this section, we take a transitional approach, beginning with structures demonstrating the value of header files and concluding with small programs illustrating the organization of multi-file programs.
Imagine that we can look into a computer's memory. We see electrical circuits at the most fundamental, physical level, which doesn't help us understand how programs organize structures in memory. Moving up to the next level of abstraction, we can see a vast sea of 1s and 0s, which is better but still not what we need. Continuing our climb to the next level of abstraction, we see a stream of bytes, which is a helpful level for our next task. A program makes sense of the bytes by partitioning them into the discrete units that form the backbone of a program. For example, a program can assemble eight bytes to form an int or a machine instruction and sixteen bytes to make a double. In the case of structures, the partitioning is more complex.
When data is copied or otherwise moved around in a program, the program treats it as a starting point in memory (i.e., an address) and a size (measured in bytes). The compiler already knows that there are eight bytes in an int and sixteen bytes in a double, so partitioning those variables is relatively straightforward. But what about a structure? Two structures can have a different number of fields, different kinds of fields, or both. So, how big is a structure, and where are the boundaries between the fields? Figure 1 illustrates how a structure partitions an otherwise formless block of bytes in memory to form distinct fields.
struct person { int id; int year; int month; int day; }; . . . person p = {123, 2015, 6, 25 };(a) |
Now, imagine what happens if a program consists of two parts, each in a different file, and each "seeing" a different specification for the same structure. For example, in some parts of the world (and in computer programs), it is common to express a date as year, month, and day, while in other places (the U.S., for example), it is common to describe a date as month, day, and year. Two different programmers could easily write a date structure specification with the fields in different orders. The following program, which compiles and runs, illustrates what happens:
File1.cpp | person.cpp |
---|---|
struct person { int id; int year; int month; int day; }; void print(person); // declaration int main() { person p = {123, 2015, 6, 25 }; print(p); return 0; }; |
#include <iostream> using namespace std; struct person { int id; int month; int day; int year; }; void print(person temp) { cout << "ID: " << temp.id << endl; cout << "Month: " << temp.month << endl; cout << "Day: " << temp.day << endl; cout << "Year: " << temp.year << endl; } |
(a) | (b) |
ID: 123 Month: 2015 Day: 6 Year: 25 | |
(c) |
This two-file program compiles and runs but produces incorrect output. Although the output is not what the programmer wants or expects, the displayed values are still recognizable from the initial values, but only because the three date fields are the same type. If the fields were different types with different sizes (and therefore had various boundaries), the overlaid structure would not match the original field boundaries, and the error would garble the output entirely.
Original Structure | Extracting Structure |
---|---|
struct person { int id; int year; int month; int day; }; |
struct person { int id; int month; int day; int year; }; |
(a) | (b) |
The likelihood of a transposition error, as illustrated above, increases with rising numbers of:
person.h |
struct person { int id; int year; int month; int day; }; void print(person); // declaration / prototype |
---|---|
File1.cpp |
#include "person.h" int main() { person p = {123, 2015, 6, 25 }; print(p); return 0; }; |
person.cpp |
#include <iostream> #include "person.h" using namespace std; void print(person temp) { cout << "ID: " << temp.id << endl; cout << "Month: " << temp.month << endl; cout << "Day: " << temp.day << endl; cout << "Year: " << temp.year << endl; } |
Using a header file reduces the tedium of copying and pasting the struct specification from one source code file to another. More importantly, it creates a single point of modification should you ever need to add or remove any fields or make any corrections. Imagine how easy it would be to miss updating a specification in one file if the program has tens or hundreds of copies of the structure spread throughout hundreds or thousands of source code files!
It is tempting to use the preprocessor to combine the source code files by #including one source code in another:
#include "person.cpp"
But this negates one of the advantages of spreading a program over multiple files and is rarely done in practice. Rather than #including one .cpp file inside another, let the linker or loader combine the files.
1 For those who are interested, the computer accesses each structure field by adding an offset to the object's beginning memory address. The compiler calculates the offset for field n by summing the sizes of fields 0 through n-1 and adding the sum to the object's address. For example, assuming the original structure's field order and letting p be an instance of the person structure, the compiler calculates the addresses for each field as: