Computer programs seem capable of representing an infinite variety of "things," from the windows and buttons in a graphical user interface (GUI) to the visually stunning locations in a game to the emerging atmospheric patterns used to predict tomorrow's weather. More incredible than the limitless number of things a computer program can represent is that the program creates all of these things from just a few kinds of simple numbers! This section presents the fundamental, built-in, or primitive data types that C++ programs use.
Whenever a C++ program uses data, the data must be typed. Typing data allows the computer to do two things. First, different types or kinds of data use different amounts of memory, and the data type tells the compiler how much memory to reserve or allocate to hold the data. For example, an int
uses 4 or 8 bytes of memory, while a double
uses 8 or 16 bytes of memory. All data stored in memory is just a sequence of binary digits, 1s and 0s, and has no meaning until the code accessing it gives it meaning. So, the second thing that the data type tells the computer is how to interpret the binary digits. For example, the bit-pattern representing an int 2 is quite different from the bit-pattern representing a double 2.
A few factors affect the size of some data. The most significant factor is the hardware. The American National Standards Institute (ANSI) controls the specification of the C++ programming language. For the best performance, the ANSI C++ standard specifies that an int
shall be the same size as the native hardware word size. A word size of 32 bits was common for years, but 64 bits is the current standard. The second factor is the compiler, which, on a 64-bit computer, can create 32- or 64-bit programs. The following table lists the most commonly supported data types, their typical sizes, and the range of values they can store.
C++ is an extensible language, meaning that programmers can create and use new data types in a program. Programmers create new data types with structures, classes, and enumerations, all covered in later chapters. The data types recognized and processed directly by the compiler are called fundamental, built-in, or primitive data types. Programmers don't create the fundamental data types - they are an intrinsic part of the C++ language.
Type | Size (bytes) | Value Range | Comments | |
---|---|---|---|---|
void |
0 | N/A | Used as a function return type and typeless pointers (treated much later). | |
bool |
1 | false or true |
In C/C++, zero is false, and non-zero is true. | integers |
char |
1 | Typically -128 to 127, but 0 to 255 on some hardware |
2-byte wide characters, called wchar , are also supported. |
|
short |
2 | -32,768 to 32,767 | Formally short int |
|
int |
varies 4 is typical |
-2,147,483,648 to 2,147,483,647 | The size and range are implementation-dependent, but 8 bytes is now typical. The C++ specification only requires that short ≤ int ≤ long. |
|
long |
4 | -2,147,483,648 to 2,147,483,647 | Formally long int |
|
float |
4 | ±3.4028234 × 10±38 | 6 - 7 significant digits | floating point |
double |
8 | ±1.79769313486231570 × 10±308 | ~15 significant digits |
The modifiers signed
and unsigned
may be applied to the integer types as well. Signed types can represent positive and negative values; unsigned types only represent non-negative values. Most data types are signed by default, but the char
type is ambiguous.
There is a further ambiguity with the sign of a character. char
is signed on some computers but is unsigned on others (i.e., the hardware determines the signedness of a char), with signed being the most common today. This ambiguity can cause problems, for example, when reading a file one character at a time. Some input functions return a symbolic constant named EOF (end-of-file) when there are no more characters to read. EOF is often just a -1, which an unsigned char cannot represent. To solve this problem, many library functions that deal with characters do so as integers, and C++ automatically converts between an int and a char (both directions) without programmer intervention.
Modern compilers are beginning to recognize additional types; for example, see the types recognized by Microsoft Visual Studio.
Constants are fixed values that do not change over time and are common in most programs. The compiler uses two simple rules and a few symbols to determine the data type of constants appearing in a program:
int
by default.double
by default.3.14
can fit into a float
, but the compiler makes it a double
; 13
will fit in a short
, but the the compiler makes it an int
.
Example | Comment |
---|---|
10 |
The compiler automatically treats numeric values that do not have a decimal point as type int |
10L |
The L instructs the compiler to treat the constant as type long |
10U |
The U instructs the compiler to treat the constant as unsigned |
10.0 |
The compiler automatically treats numeric values that include a decimal point as type double |
10F 10.0F |
The F instructs the compiler to treat the constant as type float |
0xAB |
The 0x instructs the compiler to treat the constant as an int in hexadecimal (i.e., base 16); hexadecimal numbers can use the digits 0 - 9 and a - f or A - F; hexadecimal numbers are treated as unsigned by default |
067 |
The leading 0 instructs the compiler to treat the constant as an int in octal (i.e., base 8); octal numbers can only use the digits 0 - 7, and are treated as unsigned by default |
'a' |
The single quotation marks instruct the compiler to treat the constant as type char . Note that only one character may appear between single quotation marks (see "Escape Sequences" at the bottom of the next page) |
"hello" |
The double quotation marks instruct the compiler to treat the constant as a string (called a string literal or string constant). Although a string containing only one character looks similar to a character constant (e.g., "x" and 'x' ), their representation in a computer is very different. |
1.23e20 1.23e-20 |
Scientific notation is used for very large and very small numbers. The examples mean 1.23×1020 and 1.23×10-20 respectively |
A variable is a named region of memory. The region's size (i.e., the size or the amount of memory reserved) depends on the variable's data type, as illustrated in Figure 1. The contents of memory, and therefore the value stored in a variable, can change or vary over time. Variables have three essential characteristics:
Executable machine code is based solely on the address of a variable in memory. As part of the compilation process, the compiler maps the variable name to the variable's address. When a program uses a variable, it may require either its content or its address in memory. When a variable name appears in a program, how does the compiler "know" which to use? The answer is that the correct value is determined by where the variable appears in the code.
Houses along a street provide a simple metaphor for variables. Each house has someone living in it, but the current occupants could move out, and new people could move in. Baring a catastrophe (like a tornado throwing the house over a rainbow into Oz), the address remains unchanged as people move in and out. Furthermore, the address of a given house is a function of where that house appears on a given street within a given city. Each house has a unique address. The addresses of adjacent houses differ only a little; the addresses of separated houses differ by a greater amount.
Programming languages allow programmers to name different programming elements or entities such as variables, functions, classes, etc. Scope is the location in a program where a specific name is visible and accessible. More formally, programs bind names to entities, and scope "is the region of a program where the binding is valid." For now, we focus on variable scope. Saying that a variable "comes into scope" or "goes out of scope" means that program execution enters or leaves the area in a program where the statements can use the variable - that is, program execution enters or leaves a variable's scope.
Figure 3 (above) illustrates the three variable characteristics: an address, a name, and the current content. Scope is that part or region of a program where the name and the address are bound or connected - that is, where the name maps to that specific address. The concept of binding a variable name to an address leads to two scoping rules:
Modern computers typically have many gigabytes of main memory. But not all that memory is available for a program to use: all programs running on a computer, including the operating system, must share the computer's memory. The operating system (OS) manages all a computer's resources, including memory. The OS manages memory by allocating it to programs as needed and deallocating it when it is no longer needed. Programs further manage the memory allocated to them by allocating and deallocating it for new variables as needed. Chapter 4 explores how a program manages its memory.
C++ provides two modifiers that alter how the memory needed to store a variable is allocated and managed: auto
and static
. (The auto
keyword is overloaded to implement Type Deductions, see below.) Variables are automatic (i.e., "auto") by default, and so, in this context, the auto
keyword is rarely used in practice. Alternatively, the only way to make a variable static
is by including the keyword in the definition: static int counter;
.
The computer allocates space for automatic variables1 in its main memory or RAM. It allocates the memory automatically when the variable comes into scope and frees or deallocates it when it goes out of scope. Furthermore, C++ is a block-structured language, which means that a pair of opening and closing braces define a block. Each new block forms a new scope, and because blocks may be nested, scopes can be nested much like the layers of an onion. The program allocates memory for an automatic variable defined in a block when it enters and executes the code in the block. It deallocates the memory when execution moves outside the block, past the closing brace.
Alternatively, the memory needed to hold a static
variable is allocated when the program is first loaded into memory and remains allocated throughout the program execution. Static variables retain their contents even when the name goes out of scope. So, the memory allocated for a static variable remains usable when the variable name is not in scope. The variable name always follows the scoping rules, but the memory allocation/deallocation rules for static and automatic variables differ. This observation has useful ramifications related to functions and pointers that we will explore later in Chapter 4.
Scope and memory allocation are related, but they are not the same. The connection between scope and memory allocation is very tight for automatic variables. The computer allocates the memory needed to store an automatic variable when the variable comes into scope and deallocates it when the variable goes out of scope. However, static variables illustrate the distinction between the concepts. The operating system allocates memory to store static variables when it loads the program into memory. While the program runs, static variables come in and go out of scope without deallocating their memory or losing their saved data. The program deallocates memory for static variables only when it terminates. We'll return to scope and memory allocation in chapters 6, 7, and 8.
Programmers often use static variables to create functions that retain or "remember" values from one call to the next. For example, imagine a function that defines a static
variable. The variable name has local scope - the variable is only accessible inside the function. However, when the function returns, the program retains the variable's memory and saved value. So, if the function returns the address of the memory, the program can still access the stored data. We explore how and why we do this in chapter 6.
A variable is said to be "initialized" when assigned its first or initial value. Initialization occurs at three times or places:
Formally, the value stored in an uninitialized variable is said to be undetermined. Informally, an uninitialized variable is said to contain garbage (memory is never empty, so an uninitialized variable contains the unspecified or random bits already present in memory). It is possible, but not required, to initialize (i.e., assign a value to) a variable in the same statement that defines it. The following figure illustrates the three ways of simultaneously defining and initializing a variable.
int maximum = 100; |
int maximum (100); |
int maximum {100}; |
(a) | (b) | (c) |
References:
int maximum; // variable definition maximum = 100; // variable initialization |
int maximum = 100; |
(a) | (b) |
int minimum; cout << "Please enter the minimum: "; cin >> minimum; |
int width; int height; int area; // read in the values for width and height; area = width * height; |
(c) | (d) |
Steps to using a variable:
You may replace the value in the variable and use the variable repeatedly.
|
Steps to getting a drink:
|
Analogies like this can help us visualize a problem and its solution. For example, imagine we have two variables and must swap the stored values. Now, visualize the same problem as two filled glasses. The following section describes the problem and its solution in greater detail.
Although programmers and a great deal of existing code continue using the #define directive to create symbolic constants, it is an older mechanism. The preprocessor implements the directive as a simple text substitution, bypassing the compiler component's syntax and type checks. C++ and ANSI C programmers can use the const keyword to implement fully checked symbolic constants.
const int MAX = 100; |
const double PI = 3.14159; |
const char DELIMITER = ':'; |
const string LABEL = "Exit"; |
Data types are fundamental to how a C++ program operates. From the above discussion, it's clear that constants and variables have types. But constants and variables are just specific kinds of expressions, and indeed, all expressions have a type. When operators and sub-expressions form an expression, the compiler generates code to automatically convert each sub-expression to the widest type in the expression, an operation called type promotion. The dynamic range of the type determines its width. For example, from Figure 1 above, long
and float
are both typically four bytes long, but a variable of type float
can hold much larger and much smaller values than can a long
, so a float
has a wider dynamic range than a long
.
2 * 3.14 |
int counter(100); double avogadros { 6.022e23 }; avogadros / counter; |
(a) | (b) |
char
and int
.
int
, and 3.14 is type double
. The compiler cannot directly operate on mixed types, so it promotes 2 to a double
. The result of the multiplication operation is a double-valued expression.int
and constructor-initialized to 100; avogadros is defined as a double
and is initialized to 6.022 × 1023. The compiler promotes counter to type double
before calculating the quotient. The result is a double-valued expression.In much the same way that the compiler can determine the best data type to represent an expression accurately, it can also infer or deduce an appropriate type for a variable. The ANSI C++11 standard extended the auto
keyword for this purpose; the ANSI C++14 standard added a new keyword, decltype
, that deduces a variable's type with a different syntax.
int counter = 100; //int counter(100); //int counter{ 100 }; auto max = counter; |
int counter = 100; //int counter(100); //int counter{ 100 }; decltype(counter) min; |
(a) | (b) |
An identifier is the name that a programmer gives to an element of a program. Variables are the only namable programming element introduced so far. Eventually, we'll study other namable elements, like functions and classes. Regardless of what the programmer is naming, the rules for creating legal identifiers or names are the same.
1 The term automatic variable was introduced with the ALGOL programming language and is generally applied to all languages derived from it, including C++. A more common term is local variable; a less common term is stack variable. We'll cover the underlying principles justifying these terms in later chapters. (See Glossary: Automatic variables for more detail).