6.11. Avoiding Runtime Errors

Review

Incorrect programs can fail for a variety of reasons and in a variety of ways. As computer scientists, we are responsible for preventing these failures by designing and implementing correct problem solutions and rigorously testing our programs to validate their correct behavior over various situations. Paradoxically, correct programs can also fail - if we define "correct" as behaving as they should when given correct input and not facing any external (outside the program) anomalies. This definition suggests that as computer scientists, our responsibilities go beyond writing and validating correct code, extending them to make it secure and robust. Informally, programmers often describe secure, robust code as "bulletproof."

Before beginning, we must first address five fundamental prerequisite concepts about writing bulletproof code:

  1. Prevention. Although subtle, we must distinguish between program and programmer errors. Neglecting to account for a possible condition in the original problem is one of the most common sources of program runtime errors. Programs must account for all possible problem situations and provide appropriate, predictable responses. These situations only become a program failure if programmers fail to recognize and allow for them.
  2. Cause. A logically correct program, written without coding errors, can still fail if the user input does not meet the program's requirements. The most common reason, the one considered here, is a user enters incorrect data or data in an incorrect format. Sometimes, the incorrect input comes from hardware, another program, or another computer, but these advanced problems are beyond the scope of the text.
  3. Detection. Failing programs can manifest their failure in many ways. Surprisingly, we are better served when a program fails quickly - it "crashes" - than if it continues to run in an abnormal state. Crashing prevents the program from continuing and possibly doing further damage (corrupting a database or increasing the power of an already overheated nuclear reactor, for example). When a program crashes, it's too late for it to take corrective measures. Robust programs detect and handle errors before they crash the program or cause further damage. Although C++ provides a modern, object-oriented mechanism for error handling, we only consider the non-object-oriented techniques inherited from the C programming language. The text introduces exception handling, the newer object-oriented error-handling mechanism, in a subsequent chapter. Except for assertions, our detection strategy will primarily use if-statements or similar controls.
  4. Response. If a program detects an error before it crashes, it can act to prevent the crash or cause any collateral damage. If a user action, entering incorrect data or data in an incorrect format, caused the error, the program should display a diagnostic informing the user of the error and cause. In the case of a simple program managing little data or being able to recover the data from files, a graceful and controlled shutdown following the diagnostic may be the best approach. "Graceful and controlled" implies that it should complete any appropriate cleanup - remove temporary files, close open databases and network connections, shut down physical machinery, etc. More complex programs managing a significant amount of user-entered data should loop, allowing the user to take remedial action (enter correct data, choose a different operation, etc.) and continue running. It's always best to handle errors as near their detection as possible. By "near," I mean "geographically" and temporally - the detection and response should appear together in the code, and the response should closely follow the detection in time.
  5. Audience. Consider who will read the program's diagnostics. Different audiences (software developers, end users, or other programs or computers) have different needs. Developers benefit by knowing where the error occurred and the program's state (the current values saved in important variables). In contrast, users need to know what action or input caused the problem and how to correct it. Users and developers gain more from a textual message, but sending a numeric error status to another program or computer is easier and faster. We'll focus our efforts on developers and end-users. If-statements or similar control logic to detect possible failures allows programmers to tailor diagnostic messages for a specific audience.

Avoiding Program Failures

A concrete example illustrates the prerequisite concepts. As you study the example, try to answer the following questions about the code.

  1. the potential failure, and who or what causes it,
  2. where and how the potential failure is detected,
  3. the diagnostic's audience,
  4. the response after displaying the diagnostic, and
  5. where, if undetected, the program fails.
\[ax^2 + bx + c = 0\] \[x = { -b \pm \sqrt{b^2 - 4ac} \over {2a}}\]
(a)(b)
#include <iostream>
#include <cmath>
using namespace std;

int main()
{
    cout << "Please enter a, b, and c: ";

    double a;
    cin >> a;
    
    double b;
    cin >> b;

    double c;
    cin >> c;

    double x1 = (-b + sqrt(b*b - 4*a*c)) / (2*a);
    double x2 = (-b - sqrt(b*b - 4*a*c)) / (2*a);
    cout << "x1 = " << x1 << "  x2 = " << x2 << endl;

    return 0;
}
cin >> a;
if (a == 0.0)
{
    cerr << "\"a\" may not be 0" << endl;
    exit(1);
}

double discriminant = b*b - 4*a*c; if (discriminant < 0) { double real = -b / (2*a); double imag = sqrt(-discriminant) / (2*a); cout << "x1 = " << real << " + " << imag << "i" << endl; cout << "x2 = " << real << " - " << imag << "i" << endl; } else { double x1 = (-b + sqrt(discriminant)) / (2*a); double x2 = (-b - sqrt(discriminant)) / (2*a); cout << "x1 = " << x1 << " x2 = " << x2 << endl; }
(c)(d)
The quadratic formula demonstrates detecting and responding to runtime errors. Programs can read their data and perform their calculations correctly but still fail if programmers fail to account for all possible conditions. The Roman numerals refer to the questions posed above.
  1. Our initial problem is finding the roots of a quadratic equation with three real coefficients, a, b, and c.
  2. Programming the quadratic formula is the standard approach for finding the roots.
  3. The simple program correctly reads the coefficients and translates the quadratic formula into C++ code, but the programmer (I) overlooked two situations leading to two possible failure modes. First, if the user enters a 0 for coefficient a, the program will fail with a division by 0 error (highlighted in red, V). Second, if the discriminant, b2 - 4ac, is less than 0, the square root function, sqrt (highlighted in blue, V), fails and returns an unusable value.
  4. Programmers can detect and avoid potential failures with simple control statements.
    • Adding the if-statement (in red) prevents the division by 0 failure (II). Rather than failing, the program displays a user-oriented diagnostic (III) and performs a graceful shutdown (IV). We could also use a loop, allowing the user to enter a different value for a.
    • A quadratic equation has one of three kinds of roots as determined by the discriminant. If the discriminant is greater than 0, the equation has two real roots; if 0, the equation has one real root (or two equal roots); and if it is less than 0, the equation has two complex roots. To prevent failure, we calculate the discriminant and add an if-statement (highlighted in blue, II) to calculate the complex roots. Please notice the "-" in the function call sqrt(-discriminant).
Secure, bulletproof programs always account for all possible program situations and act preemptively to prevent failure.

Functions Returning An Error Status

Library and API functions pervade general computing. They are necessarily very general because library programmers implement them before application programmers use them. How can such general functions "know" how to respond appropriately in the case of a failure? For example, terminating a program controlling an aircraft in flight or a running nuclear reactor is a bad choice. In these and many more common situations, failing functions must defer to a part of the program that does "know" what action is appropriate. So, many library functions report the failure but take no further action. One way that functions can report failure is through their return value.

Two examples demonstrate how library math functions report errors through their return values. From experience, we know that taking the square root of a negative number is an illegal operation, and we reinforce that understanding by looking at a graph of the square root function. We say the square root function's domain - its legal arguments - is all positive numbers. Alternatively, the natural logarithm goes to negative infinity at x=0 (see the graph of the natural logarithm). The sqrt and log functions report these extreme situations by returning carefully crafted values that aren't "real" numbers.

double e1 = sqrt(-2);

cout << e1 << endl;

if (isnan(e1))
	cerr << "It is a NaN" << endl;
else
	cerr << "It is NOT a NaN << endl";
double e2 = log(0);

cout << e2 << endl;

if (isinf(e2))
	cerr << "It is Inf" << endl;
else
	cerr << "It is NOT Inf" << endl;
-nan
It is a NaN
-inf
It is Inf
(a)(b)
NaN and Inf: The IEEE 754 floating-point standard. Most modern computers use the IEEE 754 standard to encode floating-point numbers. The standard divides the bits of a floating number between a mantissa, exponent, sign-bit, and an implied bit called the shadow bit. The compiler and hardware work together to hide these details from programmers. Curiously, some IEEE bit-patterns do not correspond to "real" numbers and the standard uses them to represent different failure values: Inf, ‑Inf, NaN, and ‑NaN (Not a Number). Functions returning these values do not "crash," but any additional operation on an Inf or NaN results in a NaN. If a program doesn't explicitly test for these values, the failure goes unnoticed until we look at the result.
  1. A NaN prints as some variation of "nan" (the exact output is system dependent), but programs can explicitly test for the value at any time with the isnan function.
  2. Similarly, programs can test for an Inf value with the isinf function.
char oldname[NAME_SIZE];
char newname[NAME_SIZE];
cout << "Please enter the old and new file names: ";

cin.getline(oldname, NAME_SIZE);	// reads a string from the console
cin.getline(newname, NAME_SIZE);

if (rename(oldname, newname) != 0)
{
	cerr << "File not renamed" << endl;
	exit(1);
}
System calls and error status. It's common for system calls to return an integer-encoded error status. They follow a typical protocol of returning 0 on success and -1 when they fail, but you should always check the "Return Value" section of the documentation. The example illustrates how programs can use the return value with the rename system call. Most versions of Unix and Linux support rename, as does Windows. The call renames a directory or folder, as illustrated. If the directory named in newname is busy, or if the current working directory contains a sub-directory or file with that name, rename fails.

The above examples are based on library functions and system calls, but we can also use return values to indicate the success or failure of functions we write as part of an application. This technique is handy when the function otherwise has a void return type, but it works as long as there are at least two values we can use to signal the function's status. The following figure demonstrates two similar approaches using skeletonized code fragments.

bool function1(...)
{
	if (...)
	{
		...;
		return false;
	}
	if (...)
	{
		...;
		return false;
	}
	...;
	return true;
}
int function1(...)
{
	if (...)
	{
		...;
		return 1;
	}
	if (...)
	{
		...;
		return 2;
	}
	...;
	return 0;
}
(a)(b)
Application functions returning an error status. For generality, ellipses replace the parameters, branching conditions, and other statements. Using multiple return statements isn't necessary but is often convenient. The function returns an error status as soon as it detects a problem and only returns a success status when it completes its tasks.
  1. Functions can return a Boolean status when they only need to signal success or failure.
  2. Functions typically return an integer status when they can experience multiple failure modes. By themselves, the numbers "0," "1," and "2" convey little information to some reading the source code. Programmers often use enumerations to eliminate these "magic numbers."

errno And perror: Error Statuses

The C programming language defines a global variable named errno, which C++ inherits. At some point in their execution, all <cmath> functions save their status in errno, indicating their success or failure. However, the next library function call will overwrite the status, so the program must check the value immediately after the function returns. Currently, 78 integer status values are given symbolic or macro names in <cerrno>. The names begin with "E" (signifying an error) followed by a cryptic, abbreviated description. For example: EDOM (domain error), ERANGE (range error), or EADDRINUSE (address in use error). Please see the list of error constants (right-hand column) for more names.

The numeric values representing various error conditions are arbitrary. For example, there is no intrinsic connection between the value 33 - the value of EDOM - and a domain error. Application programs can use branching logic to test errno and display appropriate diagnostics. However, programmers can use another library function, perror, to implement general error reporting functions.

double e1 = sqrt(-2);

if (errno == 0)
	cerr << "sqrt okay" << endl;
else
	cerr << "sqrt failed: " << errno << endl;

if (errno == EDOM)
	cerr << "sqrt: Domain error" << endl;
double e2 = log(0);

if (errno == 0)
	cerr << "log okay << endl";
else
	cerr << "log failed: " << errno << endl;

if (errno == ERANGE)
	cerr << "log: Range error" << endl;
sqrt failed: 33
sqrt: Domain error
log failed: 34
log: Range error
Using errno to detect and report errors. The primary weakness of the errno system is that programs must voluntarily and consistently check the saved value to determine the health of an associated function call. Exception handling, introduced in a later chapter, is an object-oriented mechanism addressing this weakness.
void error()
{
	perror("Program error");
}

int main()
{
	double e1 = sqrt(-1);
	error();

	double e2 = log(0);
	error();

	return 0;
}
Program error: Domain error
Program error: Result too large
Reporting errors with perror. The perror function has one string parameter and a void return type. When it runs, it prints a simple diagnostic to cerr, the standard console error output stream. The diagnostic message consists of the function's parameter followed by a brief description of the error code currently saved in errno.

Assertions

Assertions produce diagnostics exclusively targeting software developers. When they detect a failure, their diagnostics state the file's name and the line number where the error occurred. Although this information is meaningless to an end user, it helps developers quickly locate the error and is especially useful when debugging programs with many large files.

Programmers use the assertion mechanism to "assert a precondition" that must be true before an operation occurs. They state the precondition as an expression the program evaluates before the operation. The assertion does nothing if the expression evaluates to non-zero (true). Otherwise, it displays the diagnostic and calls abort, terminating the program.

Assertions have two major advantages over programmer-instrumented code, both gained by implementing assertions as parameterized macros. The preprocessor reads the source code, converts the assert macro into debugging information, and passes the modified code to the compiler component. Translating the modified source code to machine code loses much of the textual information available to the preprocessor. The first assertion advantage, the automatic inclusion of the assertion expression, file name, and line number, relies on information available to the preprocessor. The second advantage is programmers can leave assertions in place and activate or deactivate them as needed. If we deactivate an assertion, the preprocessor strips it out of the source code so it doesn't slow the executable or increase its size.

double	arg1 = -2;
assert(arg1 >= 0);
double e1 = sqrt(arg1);
double arg2 = 0;
assert(arg2 > 0);
double e2 = log(arg2);
(a)(b)
Assertion failed: arg1 >= 0, file assert.cpp, line 10
Assertion failed: arg2 > 0, file assert.cpp, line 14
(c)
#define NDEBUG
#include <cassert>
g++ -o assert -DNDEBUG assert.cpp	// Unix/Linux/macOS
cl /DNDEBUG assert.cpp			// Windows
(d)(e)
Assertion system examples. The assert macro looks like a function call requiring an integer-valued expression. When the program evaluates the expression, it treats a 0-value as a failure, triggering assert to display a diagnostic message and abort the program. The program treats a non-0-value as success, and assert does nothing. Programs using the assertion mechanism must #include <cassert>.
  1. An assertion example ensuring a variable is non-negative before taking its square root.
  2. An assertion example ensuring a variable is greater than zero before calculating its natural logarithm.
  3. The diagnostic is displayed when the assertion fails. The message includes the failed assertion expression (as a string of characters). It also includes the file name and line number where the assertion failed - information allowing programmers to locate the error quickly.
  4. Programmers deactivate assert macros left in the code by defining NDEBUG before the #include <cassert> directive.
  5. Programmers or build masters can also deactivate assert macros by defining NDEBUG with a command-line option ("-‍D" on Unix/Linux systems and "/‍D" on Windows) when compiling source code files or with a setting in an IDE. The macros remain in the source code, where they can be reactivated and used in the future.

Runtime Error Conclusions

The first example demonstrates the importance of anticipating all possible situations in a program. While this is part of all secure, robust programming, some programs are so large and complex that missing some situations is inevitable. We need a solution that offers at least some protection in unanticipated situations.

Returning an error status is sufficiently general that we can use it in many situations - with one caveat. Every function has a return type and a set of values it can return. At least one value must be unused and available to denote an error. For example, if a function returns a pointer, it can return nullptr to signal an error. Similarly, library functions inherited from C define a global variable named errno, where the functions save their error status. We can also use errno in our functions, circumventing the limitation imposed on a function's return value. However, the client program can fail if it doesn't anticipate or check the return value or errno for error conditions. We need a solution that programmers can't ignore or overlook.

Assertions are a partial solution. They have the advantage that they can't be ignored or inadvertently overlooked. However, their diagnostics target a limited audience: software developers. Their messages are unusable by end users, so they are typically stripped out of released applications.

Although these techniques allow us to detect and prevent some potential failures, and provide some diagnostic feedback, they are fallible. We need a more robust technique that can't be ignored or overlooked. We need a technique that works best when we anticipate a possible failure but still works when we don't. We need a technique appropriate for software developers and end users alike. Exceptions and the structures that handle them are the Object-oriented technique we need. Although exception handling can work with primitive data types and non-member functions, they are most flexible and commonly used with classes. We'll study them as part of exploring classes and objects later in the text.