C-Strings are character arrays with one additional feature: they mark the end of the string text with a special character called the null termination character. The null termination character or null terminator is the character equivalent of zero, which C++ denotes with the escape sequence '\0'
(where the escaped character is the digit zero). In general, a special value that delimits data in a data structure is called a sentinel, so the null termination character is a specific example of a sentinel.
Using a null termination character to mark the end of textual data in a C-string implies that the maximum length of a C-string is one less than the size of the character array forming the C-string. C++ arrays, including those forming C-strings, are zero-indexed, so C-strings always begin at index location 0. The null terminator can appear anywhere in the array, partially filling it if the terminator is not the last array element. The C-string functions ignore all array elements following the null terminator.
The name of an array, without any trailing brackets, is the array's address. So, C++ often represents a C-string as a character pointer that points to an array. String constants or string literals, like "hello world"
, are also C-strings. When the compiler processes a string literal, it adds the null termination character at the end of the quoted characters, stores them in memory, and generates code based on the string's address. Note that the addresses of string literals and character arrays are constant, so programs cannot change them.
(a) | char s1[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }; s1 is a C-string is a character array initialized with an aggregate initializer list (which must include the null terminator). The program can change the contents of the array (e.g., |
(b) | char s2[] = "Hello world"; char* s3 = s2; s2 is a C-string created by a character array initialized with a string literal. The compiler automatically adds the null terminator at the end of the literal and copies it to s2. s3 is a character pointer initialized to point to s2 (s2 is the name of an array without brackets, so it is an address). The program can change the contents of s2, but it cannot change the address represented by s2. The program can also change the contents of the array using s3 (e.g. |
(c) | const char* s4 = "Hello world";
Modifying a string literal (such as "Hello World") has always been risky. Some modern compilers require making pointers to literals const, which prevents the program from modifying the literal (e.g., |
(d) | char* s5 = new char[15] { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' }; C++ has always allowed programmers to create character arrays with the new operator, but until the C++15 standard, programmers could not use aggregate initializations with new. Notice that '\0' is not included in the initializer list; when the array size is part of the definition, the compiler automatically adds the null termination character. |
(e) | char* s6 = new char[15] { "Hello world" }; Similar to (c), but the "const" keyword is not required. In (c), s4 points to a string literal (i.e., a string constant), but s6 points to memory allocated with the |
(f) | char s7[15] = { 'E', 'x', 'a', 'm', 'p', 'l', 'e' }; Compare to (a): when the size of the character array is part of the C-string definition, the compiler automatically adds the null termination character. It's a compile-time error if the array size is < the number of characters in the literal. |
(g) | char s8[15] = "Example"; char s8[15] = { "Example" }; Similar to (b), but creates an array larger than the initializing string literal. |
delete
C-strings created with new
when they are no longer needed to avoid making a memory leak.
(d), (e), (f), and (g) demonstrate that it is possible to have an array that is longer than the stored string. The null terminator marks the end of the data; the array elements following the terminator contain unknown (i.e., "garbage") values that the C-string functions ignore. Each of these C-strings can hold a string 14 characters long plus the null termination character.
Take care when changing the contents of a C-string to not overflow it by adding characters beyond the end of the character array.
The previous examples notwithstanding, programmers can create empty character arrays that will later become C-strings. However, programmers shouldn't generally use the arrays as C-strings until they add the null terminator.
Text processing (i.e., manipulating strings) is a central task of many computer programs. For example, the compiler processing your programs reads a text file and somehow converts the text into a running program! We must do much more with strings beyond just defining and initializing them. The following sections explore how to print, read, and manipulate C-strings.
C++ arrays and C-strings are simple, fundamental data types. As such, programmers CANNOT use .length
with C++ arrays or .length()
with C-strings.