8.2.2. C-String Documentation And Functions

Time: 00:08:54 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides: PDF, PPTX
Review

C-strings are a fundamental or primitive data type that the compiler processes without additional information. That means that programmers can define C-strings and use the basic operators like indexing ([]), the inserter (<<), and argument passing without #including a C-string header file. However, a program must have prototypes for the C-string functions before using them. C++ inherits most C-string functions from the C programming language, so their prototypes are available in two header files.

The C-String header files. Programmers may use either header file in a C++ program, but the first is tailored for C++, making it the preferred choice. The C-string function names are terse and cryptic, but they suggest the operations the functions provide and the kind of data they operate on. Functions whose names begin with "str" operate on C-strings, and their parameters are mostly character pointers. Functions whose names begin with "mem" perform similar tasks but with more general data. The frequently used header files (e.g., <iostream>) provided with some compilers chain one header to another by including the C-string header file, making an explicit #include <cstring> unnecessary. Nevertheless, explicitly including the C-string header file makes programs more portable.

The header files contain the function prototypes for all the C-string functions. C++ documentation typically includes the same prototypes at the top of the function's description. The functions are a standard part of C++'s API, meaning the compiler system stores each function's machine instructions in a library, and the linker (or loader) extracts the instructions and incorporates them into a program as part of the compilation process. New C++ programmers need to understand four prerequisite concepts when learning to use the C-string documentation:

  1. The char data type and ASCII encoding
  2. The nullptr and the difference between null and empty strings
  3. Peculiar datatypes in C-string function documentation
  4. The Microsoft Visual Studio secure variations

ASCII Encoding

Internally, computers store textual data in a coded format. Unicode is common today (that's what Java uses). Unicode represents characters as 2-byte integers, allowing for 64K distinct characters. Although C++ can also use a wide, 2-byte character, we'll use the older, 1-byte character type char for our examples. The char data type uses the older American Standard Code for Information Interchange (ASCII) encoding scheme. ASCII is a 1-byte (or 8-bit) character that can store 256 different characters. The first 128 characters are standardized, but the last 128 (the extended ASCII code) are not always interpreted in the same way by all devices.

We can find the ASCII encoding of a specific character in any ASCII table (a web search for "ASCII table" will provide numerous examples). The first 32 characters (the first column in the table linked above) are control characters used to control connected hardware devices. The remaining characters encode the symbols, punctuation marks, digits, and alphabetic characters that comprise most of the text on a computer screen. Interestingly, the digits '0' through '9' are not represented by the numbers 0-9. The digit '0' is encoded as the numeric value of 48 (decimal or base-10), while the character '9' is encoded as 57. ASCII encodes all the digits in the contiguous range 48-57. The alphabetic characters also occupy a contiguous range, but the uppercase letters (65-90) are separate from and precede the lowercase letters (97-122).

nullptr And Empty C-Strings

nullptr, introduced in Chapter 4, is a special pointer value that indicates "pointing to nothing." Chapter 4 also suggested that we can test a pointer's value to see whether it is nullptr. But for this test to be meaningful, we must initialize the pointer before running it. A pointer variable definition, such as char* p;, allocates the memory needed to store a pointer but does not always initialize the value stored in the pointer. An uninitialized pointer variable does not store nullptr or any valid address.

char* p = nullptr;
	. . .
if (p == nullptr)
	. . .
if (p != nullptr)
	. . .
char	s[100] = "";	// (i)

char	s[100];		// (ii)
s[0] = '\0';

if (s != nullptr) . . .// is true
A C-string represented as an array. The array is said to be empty because the first or 0-th element is \0 (the null terminator). Although the string is empty, it still has a capacity (i.e., unused elements are available in the character array).
(a)(b)(c)
Null vs. empty C-Strings. A null C-string is a character pointer that does not point to anything; that is, a character pointer storing the nullptr value. Conversely, an empty C-string is an array, or a pointer to an array, that doesn't contain any character data - the program has allocated the string's memory and put a null termination character as its first element.
  1. Programs should initialize pointers at or near their definition. If a suitable address is unavailable, the program should initialize a pointer to nullptr to facilitate later testing. Uninitialized pointers contain "garbage" (i.e., their contents are unspecified), which is not the same as nullptr. Uninitialized pointers are insecure and can lead to bugs that cause program failures.
  2. An empty string has memory to store data, but does not have any data. In the case of an empty C-string, the first character in the string is the null-termination character. Programs can make C-strings empty in two ways. The advantage of (a) is that the initialization is incorporated with the definition, requiring only a single statement. (b) demonstrates initializing an array; (b.i) defines and initializes the string in the same statement; (b.ii) requires two statements but allows the program "empty" the C-string whenever necessary (e.g., emptying a string that once contained data). Remember that the name of an array is a constant pointer.
  3. An abstract representation of an empty C-string in memory. The null-termination character \0 marks the end of the string (i.e., the end of the data). So, while memory for index values ≥ 1 may contain characters from previous operations, they do not contain useful data now, and s is logically empty.
Although nullptr is the preferred way to represent a null-pointer value, C++ still recognizes 0 (zero) or the older NULL macro.

In the C-string library functions, nullptr may appear as either an argument or as a return value. But what it means depends on the specific function. For example, when a function returns nullptr, it may indicate an error has occurred; if the function is processing data, it may mean that all data is processed; or in the case of a searching function, it may indicate the search didn't find what it was looking for. nullptr can also be used as a function argument, but how a function interprets it varies from one function to another. For example, the strtok function uses nullptr and not-nullptr to either continue searching a previously provided C-string, or to start searching a new C-string, respectively.

Peculiar Datatypes In C-String Function Documentation

The parameters and return types of the C-string functions generally reflect the behavior their name (cryptically) suggests. However, the function prototypes in their documentation often use peculiar and unexpected datatypes. Furthermore, the datatypes can vary between document versions. Type aliases account for some of the peculiarities, while others are "normal" C++ types not yet covered. Four example functions introduce some of the datatypes and demonstrate the importance of carefully reading multiple documentation sections, as each contributes information necessary to use the functions correctly.

char* strcpy(char* dest, const char* src);
char* strncpy(char* dest, const char* src, size_t num);
void* memcpy(void* dest, const void* src, size_t num);
void* memmove(void* dest, const void* src, size_t num);
Example C-string function documentation. Function prototypes summarize functions, concisely conveying the information a client needs to use them. The illustrated prototypes demonstrate many of the C-string functions' parameter and return-value datatypes. Prototypes don't require parameter names, but documentation typically includes them to facilitate describing the function's behavior. Four sections in each function description contribute vital information about the functions. The documentation explicitly names three sections, but derives the fourth name from the function itself.
"function"
The function prototype: char* strcpy(char* dest, const char* src);
Description
A descriptive name: Copy string
"Parameters"
Textual data: char* (The following figure explores character-pointer parameters in detail.)
Numeric data: size_t type alias
Typeless data: void* (i.e., void pointers. C++ converts any pointer argument passed to a void pointer parameter to a void pointer; programs must explicitly cast it back to a known type to access data through it. The text covers void pointers in detail later in the chapter.)
"Return Value"
Figure 5 illustrates the purpose of returning a pointer.
Contemporary versions of the C++ cstring header file also include several "mem" functions that operate like their similarly named "str" functions. Older compilers may declare the "mem" functions in mem.h or memory.h.

 

char*	dest;	// Wrong!
	. . .
strcpy(dest, src);
char	dest[100];
	. . .
strcpy(dest, src);
(a)(b)
char*	dest = new char[100];
	. . .
strcpy(dest, src);
char dest[100];
strcpy(dest, "Hello world");
 
(c)(d)
Reading C-string documentation: A common error. Many C-string functions copy the source parameter to the destination - from right to left - while performing some string operation. If the direction seems backwards, use the assignment operator as a mnemonic: destination = source. Programmers can create the parameters in various ways, but they must ensure the destination has sufficient space to store the results.
  1. Wrong!! The compiler will treat the contents of a pointer as an address and copy the src string to that location, likely causing an error. (b) and (c) demonstrate two ways of correcting the problem.
  2. A local or stack array allocating destination memory. The strcpy function overwrites any existing data and automatically adds the null-termination character.
  3. The strcpy function treats a heap array the same as a stack array.
  4. The C++ compiler treats string literals as constant character pointers, allowing programs to use them as the source parameter but not the destination.
The "100" in the above examples is arbitrary - large enough for a simple demonstration. Programmers must ensure that dest is large enough to hold all the characters in src, plus the null termination character, without overflowing.

 

char	s1[100];
char*	s2 = "Hello, world!";
cout << strcpy(s1, s2) << endl;
C-String functions returning char*. As the strcpy prototype illustrates, many C-string functions return a character pointer, which programs often ignore. Pass-by-pointer implements an INOUT passing mechanism, allowing data to flow into the function and out through a parameter. So, C-string functions often pass their results back to a client through the destination parameter. The return value is often just a convenience, allowing clients to treat the function call as an expression embedded in statements, as illustrated.

Microsoft Secure C-String Functions

Microsoft describes the Security Features in the CRT, saying that "Many old CRT functions have newer, more secure versions. If a secure function exists, the older, less secure version is marked as deprecated and the new version has the _s ('secure') suffix."

(a)#define _CRT_SECURE_NO_WARNINGS
(b)errno_t strcpy_s(char *strDestination, size_t numberOfElements, const char *strSource);
Microsoft secure vs. insecure functions. The Microsoft secure functions perform the same basic operations as the ANSI C-string functions. Although an optional C standard covers these functions, they are not covered by a C++ standard. As they are not subject to a C++ standard, they may not be portable to non-Windows systems.
  1. The Visual Studio compiler considers the ANSI C-string functions "deprecated" and flags their use as errors or warnings. Programmers can override this default behavior (that is, to use the original functions) by putting the illustrated definition at the top of the source code file before any include directive.
  2. The Microsoft secure string copy function prototype illustrates several distinctions between the secure and ANSI functions:
    • The secure function names append an _s at the end
    • The secure functions typically have an additional parameter, numberOfElements, specifying the size of the destination. The functions transfer at most numberOfElements minus one element (reserving space for the null-termination character), preventing a buffer overrun
    • Some but not all documentation specify the destination with the size_t type alias. Other documentation uses either int or long.
    • The errno_t type alias defines an integral type suitable for representing the error code the secure functions return.
The secure functions detect when an operation would overflow the destination (i.e., when the data is too large for destination to hold) and call an error handler function to prevent and report the error.

Complete C-String Function List

There are many C-string functions, too many to explore each in detail, and too many to remember all their parameters and return values. It's more productive to learn the general operations the C-string library provides and how to use the library documentation. The following sections demonstrate the use of the C-string documentation for some of the most used functions and elaborate on their implementation to help users understand how C-strings work. Follow the links in the highlighted box to excellent C-string documentation.

Standard C-String Functions

C-string functions prototyped in <cstring>

C-String → Number Functions

C-string conversion functions

I recommend creating a bookmark for these pages in your web browser.