The process of translating C++ source code into an executable program is called "compiling the program" or just "compiling." We usually view the compilation process as a single action and generally refer to it as such. Nevertheless, a modern C++ compiler consists of at least three separate programs:
Most of the time, we refer to all three programs collectively as "the compiler." Whenever we say "the compiler," context is usually sufficient to clarify if we are talking specifically about the middle component or all three components together. If the context is not enough, then we must explicitly clarify our meaning to avoid any confusion.
A modern integrated development environments (IDE) such as Visual Studio bundle the preprocessor, compiler, and linker together so they work as a single unit and are operated through a single, common interface. Programmers initiate the compilation process with a single command (typically a single button press or menu item). Modern IDEs also include many additional tools:
Although these features are controled through the common interface, most are implemented as separate programs.
The preprocessor handles statements or lines of code that begin with the "#" character, which are called "preprocessor directives." Note that directives are not C++ statements (and therefore do not end with a semicolon) but rather instruct the preprocessor to carry out some action. The preprocessor reads and processes each file one at a time from top to bottom. It does not change the contents of any of the files that it processes but creates a temporary file that contains the processed code. The compiler component reads and translates the temporary file from C++ to machine code. When the compiler component finishes processing the code in the temporary file, it removes the file. Two of the most common directives, and the first that we will use, are #include
and #define
.
The #include directive consists of two parts, all on the same line and separated by at least one space:
#include
When the preprocessor encounters the #include directive, it opens the header file and copies the contents into the temporary file. The symbols surrounding the name of the header file are important and determine where the preprocessor looks for the file.
.h
extension.You might see two kinds of system header files in a C++ program. Older system header files end with a ".h" extension: <name.h>
. These header files were originally created for C programs, but may also be used with C++. Newer system header files do not end with an extension: <name>
, may only be used with C++.
#include <iostream> #include "person.h"
<
and >
refer to system header files; file name appearing between an opening and closing "
refer to header files written by the programmer as a part of the program.
The include directive does not end with a semicolon and there must be at least one space between the directive and the file name.
The #define directive introduces a programming construct called a macro. A simple macro only replaces one string of characters with another string. We'll look at more complex, parameterized macros in chapter 6. The #define directive is one (old) way of creating a symbolic constant (also known as a named or manifest constant). The const
and enum
keywords are newer techniques for creating constants, and are presented in more detail later. It is a well-accepted naming practice to write the names of symbolic constants with all upper-case characters (this provides a visual clue that the name represents a constant).
#define NAME_SIZE 25
:
NAME_SIZE
with the characters 25
Symbolic/named/manifest constants are useful in two significant ways:
NAME_SIZE
, viewed in the context of a specific program and the problem that it solves, likely conveys more meaning to a reader than does "25."The define directive does not end with a semicolon and there must be at least one space between the directive and the identifier (i.e., name), and between the identifier and the defined value; the defined value (the third part of the directive) is optional.
The compiler component, the middle program in the overall compiler system, is the largest and arguably the most critical part of the system. The compiler translates C++ source code into the machine code that a specific computer "understands" and can execute. The picture below suggests that a single program can consist of multiple source code files. Programs come in a vast range of sizes, from a few tens of lines of code to many millions of lines. It is both awkward and inconvenient to deal with large programs in a single source code file, and spreading them over multiple files has many advantages:
The preprocessor processes each source code file one at a time and produces a single temporary file. Similarly, the compiler processes each temporary file one at a time and produces one object file for each temporary file. Object files contain machine code and information that the linker uses to complete its tasks. (Note that "object" in this context has nothing to do with the objects involved in object-oriented programming.)
The compiler component also detects syntax errors and provides the diagnostic output programmers use to find and correct those errors. Despite all that the compiler does, its operation is transparent to programmers for the most part.
The linker is the third and last component of the full compiler system. It takes the object files created by the compiler and links them together, along with library code and a runtime file, to form a complete, executable program. The name of the executable file depends on the hosting operating system: On a Windows computer, the linker produces a file whose name ends with a ".exe" extension. On Linux, Unix, and macOS systems, the linker produces a file named a.out
by default. Users may also specify a name that overrides the default (note that executable files do not have a standard extension on these systems).