7.11. Arrays And Security

Time: 00:06:08 | Download: Large, Large (CC), Small | Streaming Streaming (CC) | Slides (PDF)

A full bottle of Diet Coke beside an empty glass. — **Array out-of-bounds error**. Indexing an array out-of-bounds, known as a buffer overflow or buffer overrun, often happens when a program tries to put more data into an array than it can hold - similar to trying to pour ½ a liter of Diet Coke into a 5-ounce glass. Recalling that valid array index values lie between `0 ≤ index ≤ size-1`, we generalize that any index value < 0 or ≥ size causes an out-of-bounds error. (No Diet Coke was spilled while photographing this illustration.)

A partially filled bottle of Diet Coke next to a full glass. The glass is too small to hold all of the Coke. — **Array out-of-bounds error**. Indexing an array out-of-bounds, known as a buffer overflow or buffer overrun, often happens when a program tries to put more data into an array than it can hold - similar to trying to pour ½ a liter of Diet Coke into a 5-ounce glass. Recalling that valid array index values lie between `0 ≤ index ≤ size-1`, we generalize that any index value < 0 or ≥ size causes an out-of-bounds error. (No Diet Coke was spilled while photographing this illustration.)

The behavior of a program indexing an array out of bounds is unpredictable and erratic. Three conditions associated with the memory accessed by the out-of-bounds indexing account for most of the program's arbitrary behavior:

Ownership: Whenever the operating system allocates memory to a process (i.e., a running program), the process "owns" the memory until it returns it to the OS. An out-of-bounds index operation accessing memory the process doesn't own causes an immediate runtime error, aborting the program. If the operation stays within the process's memory, it may corrupt adjacent data, return incorrect data, or "misplace" data it can't find later.
Content: Computer memory content is dynamic, varying based on the processes (including the operating system) the computer runs and their execution order. Computers run tasks based on the users' tasks and whims (e.g., reading comics or watching amusing cat videos). Furthermore, memory retains the last saved value until a new process overwrites it, implying that a process can "inherit" memory contents from an unrelated process. How these obscure values affect a program depends on how it uses them.
Use: Apart from accessing unowned memory, how a process uses the data acquired by an out-of-bounds index operation has the greatest impact on its behavior. Although it may not crash the program, the displayed data is meaningless. Alternatively, using the data as a pointer or an index into another array will likely crash the program quickly. Corrupted data may only cause a problem later and then in a different part of the program. (For example, early in my software engineering career, I indexed an array out of bounds, corrupting a FILE variable. The C program consisted of 25 files; the corruption occurred in one file, and the failure, later in the program's execution, in a different file.)

Beyond creating a bug that is challenging to find, indexing an array out-of-bounds also creates a significant security risk. Several well-known computer viruses have exploited buffer overflow errors.

Remembering that C++ does not automatically test array indexes for out-of-bounds conditions is software developers' first step toward creating safe, secure, and robust code. Their next step is understanding when a program must test an index before using it and when a test is an unnecessary expense.

Whenever a program bases an array index on user input, it must verify that the final index value is valid before the indexing operation.
```
Door	doors[3];
int	door;

cout << "Choose a door: ";
cin >> door;

if (door > 0 && door <= 3)
	... doors[door - 1]...;
else
	cerr << "Valid doors are 1, 2, or 3" << endl;
```
Validating user input. A U.S. TV game show allowed contestants to choose one of three doors, keeping whatever was behind the selected door. The code fragment implements the selection operation. The doors are labeled "1" through "3," but an array of Door objects is indexed 0 through 2. The program adjusts the label value to the index value by subtracting 1. If the contestant's adjusted value is valid, the program uses the selected Door object in some way.

Add tests to prevent indeterminate loops from overrunning the end of the array:

int	scores[100];
int	score;
int	count = 0;

cout << "Enter a score (-1 to stop): ";

cin >> score;
while (score != -1 && count < 100)
{
	scores[count++] = score;
	cin >> score;
}

int	scores[100];
int	count = 0;

cout << "Enter a score (-1 to stop): ";

do
{
	cin >> scores[count++];
} while (scores[count - 1] != -1 && count < 100);
count--;		// discard the -1

Guarding indeterminate loops. The figure updates two examples from the previous section on Arrays And Loops. Both loops are modified to include a test (highlighted in yellow) to prevent the loops from overfilling an array.

Programmers must rigorously test calculations producing index values before deploying a program.
```
for (. . . i . . .)
	for (. . . j . . .)
		. . . array[i - j] . . .
```
Validating index calculations. This highly abbreviated example represents my array out-of-bounds error mentioned above. The code, which was mostly correct, was part of a multi-file program. Being dyslexic, I reversed the order of the two variables highlighted in yellow. The expression i - j initially produced a value greater than 0, went to 0, and then became negative, causing the index error. The error corrupted a variable defined and used in another file but allocated in memory adjacent to the array.
Although it is possible to include an if-statement inside a loop to detect this kind of error, it incurs the expense of a needless test. The example illustrates a programmer-created logical error, which, when identified and corrected, will not cause further problems. Rather than adding a test, rigorously validate the code, using the debugger to locate and identify any errors.

Pass arrays to functions as two arguments: the array itself and the array's size or capacity.

An array with a capacity of 8, but only the first 4 elements are filled. — **Arrays as function arguments**. Passing an array to a function requires the array and an upper bound for indexing into it (the lower bound is always zero).

An abstract representation of an array, the values characterizing it, and the C++ code implementing it.

Programs calling functions storing data in an array should pass the array's capacity as an argument.

Programs calling functions using array data should pass the array's size (or length).

If a function saves data in the array and uses data already saved in it, programs may need to pass both the size and capacity.