7.9. Arrays And Security

Time: 00:06:08 | Download: Large, Large (CC), Small | Streaming Streaming (CC) | Slides (PDF)
A full bottle of Diet Coke beside an empty glass. A partially filled bottle of Diet Coke next to a full glass. The glass is too small to hold all of the Coke.
Array out-of-bounds error. Indexing an array out-of-bounds, known as a buffer overflow or buffer overrun, often happens when a program tries to put more data into an array than it can hold - similar to trying to pour ½ a liter of Diet Coke into a 5-ounce glass. Recalling that valid array index values lie between 0 ≤ index ≤ size-1, we generalize that any index value < 0 or ≥ size causes an out-of-bounds error. (No Diet Coke was spilled while photographing this illustration.)

The behavior of a program indexing an array out of bounds is unpredictable and erratic. Three conditions associated with the memory accessed by the out-of-bounds indexing account for most of the program's arbitrary behavior:

  1. Ownership: Whenever the operating system allocates memory to a process (i.e., a running program), the process "owns" the memory until it returns it to the OS. An out-of-bounds index operation accessing memory the process doesn't own causes an immediate runtime error, aborting the program. If the operation stays within the process's memory, it may corrupt adjacent data, return incorrect data, or "misplace" data it can't find later.
  2. Content: Computer memory content is dynamic, varying based on the processes (including the operating system) the computer runs and their execution order. Computers run tasks based on the users' tasks and whims (e.g., reading comics or watching amusing cat videos). Furthermore, memory retains the last saved value until a new process overwrites it, implying that a process can "inherit" memory contents from an unrelated process. How these obscure values affect a program depends on how it uses them.
  3. Use: Apart from accessing unowned memory, how a process uses the data acquired by an out-of-bounds index operation has the greatest impact on its behavior. Although it may not crash the program, the displayed data is meaningless. Alternatively, using the data as a pointer or an index into another array will likely crash the program quickly. Corrupted data may only cause a problem later and then in a different part of the program. (For example, early in my software engineering career, I indexed an array out of bounds, corrupting a FILE variable. The C program consisted of 25 files; the corruption occurred in one file, and the failure, later in the program's execution, in a different file.)
Beyond creating a bug that is challenging to find, indexing an array out-of-bounds also creates a significant security risk. Several well-known computer viruses have exploited indexing (see Buffer overflow).

Remembering that C++ does not automatically test array indexes for out-of-bounds conditions is software developers' first step toward creating safe, secure, and robust code. Their next step is understanding when a program must test an index before using it and when a test is an unnecessary expense.