8.2.2. C-String Documentation And Functions

Time: 00:08:54 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)
Review

C-strings are a fundamental or primitive data type that the compiler processes without additional information. That means that programmers can define C-strings and use the basic operators like indexing ([]), the inserter (<<), and argument passing without #including a header file. However, a program must have prototypes for the C-string functions before using them. C++ inherits most C-string functions from the C programming language, suggesting that the prototypes are available in two header files. (Some systems may chain header files making an explicit #include unnecessary, but explicitly including one will make your code more portable.)

The C-String header files. Programmers may use either header file in a C++ program, but the first one is tailored for C++ and is preferred. The names of the functions prototyped in <cstring> are a bit cryptic but suggest the services they provide and the kind of data they operate on. Functions whose names begin with "str" operate on C-strings, and their parameters are mostly char*. Functions whose name begin "mem" perform similar tasks but with more general data; their parameters are mostly void* (see void Pointers later in this chapter). cplusplus.com provides a complete list of the <cstring> functions.

The header files contain the function prototypes for all the C-string functions. C++ documentation typically includes the same prototypes at the top of the function's description. The functions are a standard part of C++'s API, meaning the compiler system stores each function's machine instructions in a library, and the linker (or loader) extracts the instructions and incorporates them into a program as part of the compilation process. There are five prerequisite concepts that new C++ programmers must understand when learning to use the C-string documentation:

  1. The char data type and ASCII encoding
  2. Understanding the meaning of nullptr or NULL
  3. Understanding size_t and errno_t
  4. Getting the function arguments correct
  5. Microsoft Visual Studio has variations of many of these functions, which have slightly different argument lists and return values

ASCII Encoding

Internally, computers store textual data in a coded format. Unicode is common today (that's what Java uses). Unicode represents characters as a 2-byte integer, which allows for 64K different characters. Although C++ can also use a wide, 2-byte character, we'll use the older, 1-byte character type char for our examples. The char data type uses the older encoding scheme known as the American Standard Code for Information Interchange (ASCII). ASCII is a 1-byte (or 8-bit) character that can store 256 different characters. The first 128 characters are standardized, but the last 128 (the extended ASCII code) are not always interpreted in the same way by all devices.

We can find the ASCII encoding of a specific character in any ASCII table (just Google "ASCII table" and you will see numerous examples). The first 32 characters (the first column in the table linked above) are control characters used to control connected hardware devices. The remaining characters encode the symbols, punctuation marks, digits, and alphabetic characters that comprise most of the text on a computer screen. Interestingly, the digits '0' through '9' are not represented by the numerical values of 0-9. The digit '0' is encoded as the numeric value of 48 (decimal or base-10), while the character '9' is encoded as 57. ASCII encodes all the digits in a contiguous range from 48 to 57. The alphabetic characters also occupy a contiguous range, but the upper case letters (65-90) are separate from and come before the lower case letters (97-122).

nullptr And Empty C-Strings

nullptr was introduced in chapter 4 as a special pointer that indicates "pointing to nothing." Chapter 4 also suggested that we can test the content of a pointer to see if it is nullptr or not. But for this test to be meaningful, we must initialize the pointer before doing the test. A pointer variable definition such as char* p; allocates the memory needed to store a pointer but does not initialize the value stored in the pointer. An uninitialized pointer variable does not store nullptr or any valid address.

char* p = nullptr;
	. . .
if (p == nullptr)
	. . .
if (p != nullptr)
	. . .
char	s[100] = "";

char	s[100];
s[0] = '\0';

if (s != nullptr) . . .// is true
A C-string represented as an array. The array is said to be empty because the first or 0-th element is \0 (the null terminator). Although the string is empty, it still has a capacity (i.e., unused elements are available in the character array).
(a)(b)(c)
Null vs. empty C-Strings. A null C-string is a character pointer that does not point to anything; an empty C-string is a pointer or an array containing no character data (i.e., the memory is allocated but does not contain data).
  1. Programs should initialize pointers at or near their definition. If a suitable address is unavailable, the program should initialize a pointer to nullptr to facilitate later testing. Uninitialized pointers contain "garbage" (i.e., their contents are unspecified), which is not the same as nullptr. Uninitialized pointers are not secure and can represent potential bugs that can cause program failures.
  2. An empty string has memory to store data but does not have any data. In the case of an empty C-string, the first character in the string is the null-termination character. Programs can make C-strings empty in two ways. (a) has the advantage that it only takes one statement. (b) has the advantage that the program can make the C-string empty whenever convenient (e.g., emptying a string that once contained data). Remember that the name of an array is a constant pointer.
  3. An abstract representation of an empty C-string in memory. The null-termination character \0 marks the end of the string (i.e., the end of the data). So, while memory for index values ≥ 1 may contain characters from previous operations, they do not contain useful data now, and s is logically empty.
nullptr is preferred, but programmers may replace it in any of the above examples with NULL or with 0 (zero); in C++, NULL is a macro or symbolic constant that represents 0.

In the C-string library functions, nullptr may appear as either an argument or as a return value. But what it means depends on the specific function. For example, when a function returns nullptr, it may indicate an error has occurred; if the function is processing data, it may mean that all data is processed; or in the case of a searching function, it may indicate the search didn't find what it was looking for. nullptr can also be used as a function argument, but how a function interprets it varies from one function to another. For example, the strtok function uses nullptr and not-nullptr to either continue searching a previously provided C-string or to start searching a new C-string respectively.

C-String Function Documentation

The C-string documentation uses previously discussed concepts while introducing new ones. Four example functions illustrate additional documentation features. They also demonstrate that programmers must carefully read multiple documentation sections as each contributes information vital for correct usage.

char* strcpy(char* dest, const char* src);
char* strncpy(char* dest, const char* src, size_t num);
(a)(b)
void* memcpy(void* dest, const void* src, size_t num);
void* memmove(void* dest, const void* src, size_t num);
(c)(d)
Example C-string function documentation. Function prototypes characterize the functions, illustrating three kinds of arguments. Program pass textual arguments as character pointers, char*, numeric arguments as the size_t type alias, and deliberately ambiguous arguments as void pointers, void*.

Void pointers are C++'s most generic data type, similar to Java's Object class. When the program passes a pointer argument to a void-pointer parameter, the compiler automatically converts it to void*. To use a void pointer, programmers must explicitly cast it back to a "real" pointer - a pointer to a known type. Fortunately, these functions only need an address, allowing us to delay a more detailed examination to the chapter's end (see Searching and Sorting).

Four documentation sections, "function," description (derived from the function name), "Parameters," and "Return Value" provide the following function details:

Misunderstanding the four prototypes can lead to a frequent programming error.

Each of the example functions copy src to dest - copying from the right operand to the left. The functions operate independently of how the program allocates the src to dest memory, assuming that it allocates them correctly. However, the appearance of src and dest in the prototypes is often confusing to new programmers. The following figure uses the strcpy function to illustrate and clarify the problem.

char*	dest;	// Wrong!
	. . .
strcpy(dest, src);
char	dest[100];
	. . .
strcpy(dest, src);
(a)(b)
char*	dest = new char[100];
	. . .
strcpy(dest, src);
char dest[100];
strcpy(dest, "Hello world");
(c)(d)
Reading C-string documentation: A common error. Specifying dest as a pointer in the prototype sometimes leads new programmers to define it as a pointer in the client code.
  1. Wrong!! The compiler will treat the contents of a pointer as an address and copy the src string to that location, likely causing an error. If dest is somehow assigned memory in the code represented by the ellipsis (either by pointing to an existing array or by assigning to it memory allocated with new ), then everything will be okay.
  2. The easiest way to allocate memory for a string is with a character array. The copy function will automatically add the null-termination character at the end of dest as a part of the copy operation. However, the string concatenation function strcat requires that the destination is null-terminated before the function call.
  3. It is also possible to allocate memory for a C-string dynamically with the new operator. Doing this has the advantage that the program can specify the array's size with a variable whose value is either entered by a user or calculated during program execution.
  4. The compiler allows src to be a string literal or constant, but not dest.
The "100" in the above examples is arbitrary - large enough for a simple demonstration. Programmers must ensure that dest is large enough to hold all the characters in src, plus the null termination character, without overflowing.

Typical Return Types

The prototype given above indicates that strcpy returns a character pointer, but the examples ignore entirely the return value. Programs pass arguments to strcpy by pointer. The second argument is a const, preventing strcpy from modifying it. But the first argument is not passed as a const, allowing strcpy to modify it - making the first argument an input/output argument. But strcpy also returns the address of the first or destination argument. When a function provides the same information in multiple ways, we call it a convenience feature. Returning the first argument (with the return keyword) allows us to use the function call as an expression:

char	s1[100];
char*	s2 = "Hello, world!";
cout << strcpy(s1, s2) << endl;

Which, of course, prints Hello, World!

Visual Studio Return Types

Alternatively, the Microsoft documentation for the similar functions is:

errno_t strcpy_s(char *strDestination, size_t numberOfElements, const char *strSource);

Not all documentation uses size_t. Some documentation will use either int or long, but the functions behave the same regardless of the specific data type used to denote the size.

Microsoft Secure C-String Functions

Microsoft describes the Security Features in the CRT saying that "Many old CRT functions have newer, more secure versions. If a secure function exists, the older, less secure version is marked as deprecated and the new version has the _s ('secure') suffix." Being "marked as deprecated" means that the compiler will flag deprecated functions as errors and cease code generation. It is possible to override this default behavior (that is, to use the original CRT functions) by including the following directive at the top of the program:

#define _CRT_SECURE_NO_WARNINGS

The secure versions of the functions typically add one more argument to the function parameter list: the maximum size of the target C-string or "the size of the string buffer." Even more confusing, at least one secure Microsoft function adds an "n" to the middle of the function name; the "n" also indicates that the function takes the extra size argument. However, some variations of the "standard" functions also use the same naming convention. The secure functions detect when an operation would overflow the destination (i.e., when the data is too large for destination to hold) and call an error handler function to prevent and report the error. The following two function prototypes illustrate the differences:

char* strcpy (char* destination, const char* source );
errno_t strcpy_s(char* strDestination, size_t numberOfElements, const char* strSource);

Complete C-String Function List

There are many C-string functions; exploring each would take too much time without benefiting us. Furthermore, memorizing the details of each one doesn't help us either. So, we can best use our time learning how to read and understand each function's documentation. Two steps are sufficient for successfully using a large set of API functions such as those available for C-strings:

  1. Knowing in general the kinds of operations (i.e., functions) available in the API
  2. Knowing where you can find the details when you need to use one of the operations or functions

The four textbook sections that follow illustrate four frequently used C-string functions and how to use them. The links below will take you to more extensive lists of C-string functions and examples.

Standard C-String Functions

C-string functions prototyped in <cstring>

C-String → Number Functions

C-string conversion functions

I recommend creating a bookmark for these pages in your web browser.