8.2. C-Strings

C-Strings are character arrays with one additional feature: they mark the end of the string text with a special character called the null termination character. The null termination character or null terminator is the character equivalent of zero, which C++ denotes with the escape sequence '\0' (where the escaped character is the digit zero). In general, a special value that delimits data in a data structure is called a sentinel, so the null termination character is a specific example of a sentinel.

Using a null termination character to mark the end of textual data in a C-string implies that the maximum length of a C-string is one less than the size of the character array forming the C-string. C++ arrays, including those forming C-strings, are zero-indexed, so C-strings always begin at index location 0. The null terminator can appear anywhere in the array, partially filling it if the terminator is not the last array element. The C-string functions ignore all array elements following the null terminator.

The name of an array, without any trailing brackets, is the array's address. So, C++ often represents a C-string as a character pointer that points to an array. String constants or string literals, like "hello world", are also C-strings. When the compiler processes a string literal, it adds the null termination character at the end of the quoted characters, stores them in memory, and generates code based on the string's address. Note that the addresses of string literals and character arrays are constant, so programs cannot change them.

The previous examples notwithstanding, programmers can create empty character arrays that will later become C-strings. However, programmers shouldn't generally use the arrays as C-strings until they add the null terminator.

Text processing (i.e., manipulating strings) is a central task of many computer programs. For example, the compiler processing your programs reads a text file and somehow converts the text into a running program! We must do much more with strings beyond just defining and initializing them. The following sections explore how to print, read, and manipulate C-strings.

Caution

C++ arrays and C-strings are simple, fundamental data types. As such, programmers CANNOT use .length with C++ arrays or .length() with C-strings.

(a)	char s1[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' }; `s1` is a C-string is a character array initialized with an aggregate initializer list (which must include the null terminator). The program can change the contents of the array (e.g., `s1[0] = 'h';`) but not the address represented by the name `s1` (e.g., `s1 = ...;`).
(b)	char s2[] = "Hello world"; char* s3 = s2; `s2` is a C-string created by a character array initialized with a string literal. The compiler automatically adds the null terminator at the end of the literal and copies it to `s2`. `s3` is a character pointer initialized to point to `s2` (`s2` is the name of an array without brackets, so it is an address). The program can change the contents of `s2`, but it cannot change the address represented by `s2`. The program can also change the contents of the array using `s3` (e.g. `s3[0] = 'h';`), and it can also change `s3` (e.g., `s3 = s1;`).
(c)	const char* s4 = "Hello world"; Modifying a string literal (such as "Hello World") has always been risky. Some modern compilers require making pointers to literals `const`, which prevents the program from modifying the literal (e.g., `s4[0] = 'h';` is a compile-time error). Programs may change the pointer itself: `s4 = ...;`.
(d)	char* s5 = new char[15] { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' }; C++ has always allowed programmers to create character arrays with the `new` operator, but until the C++15 standard, programmers could not use aggregate initializations with `new`. Notice that `'\0'` is not included in the initializer list; when the array size is part of the definition, the compiler automatically adds the null termination character.
(e)	char* s6 = new char[15] { "Hello world" }; Similar to (c), but the "const" keyword is not required. In (c), `s4` points to a string literal (i.e., a string constant), but `s6` points to memory allocated with the `new` operator and this memory is changeable.
(f)	char s7[15] = { 'E', 'x', 'a', 'm', 'p', 'l', 'e' }; Compare to (a): when the size of the character array is part of the C-string definition, the compiler automatically adds the null termination character. It's a compile-time error if the array size is < the number of characters in the literal.
(g)	char s8[15] = "Example"; char s8[15] = { "Example" }; Similar to (b), but creates an array larger than the initializing string literal.