C-strings are a fundamental or primitive data type that the compiler processes without additional information. That means that programmers can define C-strings and use the basic operators like indexing ([]), the inserter (<<), and argument passing without #including a header file. However, a program must have prototypes for the C-string functions before using them. C++ inherits most C-string functions from the C programming language, suggesting that the prototypes are available in two header files. (Some systems may chain header files making an explicit #include unnecessary, but explicitly including one will make your code more portable.)
The header files contain the function prototypes for all the C-string functions. C++ documentation typically includes the same prototypes at the top of the function's description. The functions are a standard part of C++'s API, meaning the compiler system stores each function's machine instructions in a library, and the linker (or loader) extracts the instructions and incorporates them into a program as part of the compilation process. There are five prerequisite concepts that new C++ programmers must understand when learning to use the C-string documentation:
Internally, computers store textual data in a coded format. Unicode is common today (that's what Java uses). Unicode represents characters as a 2-byte integer, which allows for 64K different characters. Although C++ can also use a wide, 2-byte character, we'll use the older, 1-byte character type char for our examples. The char data type uses the older encoding scheme known as the American Standard Code for Information Interchange (ASCII). ASCII is a 1-byte (or 8-bit) character that can store 256 different characters. The first 128 characters are standardized, but the last 128 (the extended ASCII code) are not always interpreted in the same way by all devices.
We can find the ASCII encoding of a specific character in any ASCII table (just Google "ASCII table" and you will see numerous examples). The first 32 characters (the first column in the table linked above) are control characters used to control connected hardware devices. The remaining characters encode the symbols, punctuation marks, digits, and alphabetic characters that comprise most of the text on a computer screen. Interestingly, the digits '0' through '9' are not represented by the numerical values of 0-9. The digit '0' is encoded as the numeric value of 48 (decimal or base-10), while the character '9' is encoded as 57. ASCII encodes all the digits in a contiguous range from 48 to 57. The alphabetic characters also occupy a contiguous range, but the upper case letters (65-90) are separate from and come before the lower case letters (97-122).
nullptr
was introduced in chapter 4 as a special pointer that indicates "pointing to nothing." Chapter 4 also suggested that we can test the content of a pointer to see if it is nullptr
or not. But for this test to be meaningful, we must initialize the pointer before doing the test. A pointer variable definition such as char* p;
allocates the memory needed to store a pointer but does not initialize the value stored in the pointer. An uninitialized pointer variable does not store nullptr or any valid address.
char* p = nullptr; . . . if (p == nullptr) . . . if (p != nullptr) . . . |
char s[100] = ""; char s[100]; s[0] = '\0'; if (s != nullptr) . . .// is true |
|
(a) | (b) | (c) |
\0
marks the end of the string (i.e., the end of the data). So, while memory for index values ≥ 1 may contain characters from previous operations, they do not contain useful data now, and s is logically empty.In the C-string library functions, nullptr may appear as either an argument or as a return value. But what it means depends on the specific function. For example, when a function returns nullptr, it may indicate an error has occurred; if the function is processing data, it may mean that all data is processed; or in the case of a searching function, it may indicate the search didn't find what it was looking for. nullptr can also be used as a function argument, but how a function interprets it varies from one function to another. For example, the strtok function uses nullptr and not-nullptr to either continue searching a previously provided C-string or to start searching a new C-string respectively.
The C-string documentation uses previously discussed concepts while introducing new ones. Four example functions illustrate additional documentation features. They also demonstrate that programmers must carefully read multiple documentation sections as each contributes information vital for correct usage.
char* strcpy(char* dest, const char* src); |
char* strncpy(char* dest, const char* src, size_t num); |
(a) | (b) |
void* memcpy(void* dest, const void* src, size_t num); |
void* memmove(void* dest, const void* src, size_t num); |
(c) | (d) |
Void pointers are C++'s most generic data type, similar to Java's Object class. When the program passes a pointer argument to a void-pointer parameter, the compiler automatically converts it to void*. To use a void pointer, programmers must explicitly cast it back to a "real" pointer - a pointer to a known type. Fortunately, these functions only need an address, allowing us to delay a more detailed examination to the chapter's end (see Searching and Sorting).
Four documentation sections, "function," description (derived from the function name), "Parameters," and "Return Value" provide the following function details:
Each of the example functions copy src to dest - copying from the right operand to the left. The functions operate independently of how the program allocates the src to dest memory, assuming that it allocates them correctly. However, the appearance of src and dest in the prototypes is often confusing to new programmers. The following figure uses the strcpy function to illustrate and clarify the problem.
char* dest; // Wrong! . . . strcpy(dest, src); |
char dest[100]; . . . strcpy(dest, src); |
(a) | (b) |
char* dest = new char[100]; . . . strcpy(dest, src); |
char dest[100]; strcpy(dest, "Hello world"); |
(c) | (d) |
new
), then everything will be okay.new
operator. Doing this has the advantage that the program can specify the array's size with a variable whose value is either entered by a user or calculated during program execution.The prototype given above indicates that strcpy returns a character pointer, but the examples ignore entirely the return value. Programs pass arguments to strcpy by pointer. The second argument is a const
, preventing strcpy from modifying it. But the first argument is not passed as a const
, allowing strcpy to modify it - making the first argument an input/output argument. But strcpy also returns the address of the first or destination argument. When a function provides the same information in multiple ways, we call it a convenience feature. Returning the first argument (with the return
keyword) allows us to use the function call as an expression:
char s1[100]; char* s2 = "Hello, world!"; cout << strcpy(s1, s2) << endl;
Which, of course, prints Hello, World!
Alternatively, the Microsoft documentation for the similar functions is:
errno_t strcpy_s(char *strDestination, size_t numberOfElements, const char *strSource);
Not all documentation uses size_t. Some documentation will use either int
or long
, but the functions behave the same regardless of the specific data type used to denote the size.
Microsoft describes the Security Features in the CRT saying that "Many old CRT functions have newer, more secure versions. If a secure function exists, the older, less secure version is marked as deprecated and the new version has the _s ('secure') suffix." Being "marked as deprecated" means that the compiler will flag deprecated functions as errors and cease code generation. It is possible to override this default behavior (that is, to use the original CRT functions) by including the following directive at the top of the program:
#define _CRT_SECURE_NO_WARNINGS
The secure versions of the functions typically add one more argument to the function parameter list: the maximum size of the target C-string or "the size of the string buffer." Even more confusing, at least one secure Microsoft function adds an "n" to the middle of the function name; the "n" also indicates that the function takes the extra size argument. However, some variations of the "standard" functions also use the same naming convention. The secure functions detect when an operation would overflow the destination (i.e., when the data is too large for destination to hold) and call an error handler function to prevent and report the error. The following two function prototypes illustrate the differences:
char* strcpy (char* destination, const char* source );
errno_t strcpy_s(char* strDestination, size_t numberOfElements, const char* strSource);
There are many C-string functions; exploring each would take too much time without benefiting us. Furthermore, memorizing the details of each one doesn't help us either. So, we can best use our time learning how to read and understand each function's documentation. Two steps are sufficient for successfully using a large set of API functions such as those available for C-strings:
The four textbook sections that follow illustrate four frequently used C-string functions and how to use them. The links below will take you to more extensive lists of C-string functions and examples.
I recommend creating a bookmark for these pages in your web browser.