Comparing or ordering two C-strings, especially for equality, can be confusing. Nevertheless, these operations are crucial for many text-processing programs. I believe there are three main reasons for the confusion. First, C++ has two kinds of strings, each with a different way of comparing or ordering them. The C++ string class supports the full set of C++ relational operators, ==, !=, <, <=, >, and >=, but C-strings do not. Second, we must use the strcmp function to compare C-strings, and its behavior can be hard to understand. Finally, neither C nor C++ has an "equals" function or operator for comparing C-strings for equality.
string class vs. C-strings. The C++ string class supports the same relational operators as the fundamental or primitive data types. However, C-strings do not support these operators, which is confusing because C-strings are a fundamental type.
When comparing instances of the string class, the relational operators evaluate to a boolean value: true or false.
The strcmp function is the only valid way to compare C-strings, and it returns a negative value, 0, or a positive value. We will explore the meaning of the returned value in greater detail shortly. The strcmp function calls in column (b) correspond to the operations on the same row in column (a).
The equality operator compares the addresses saved in s1 and s2, not the text "hello" to which they point. So, the if-statement prints NOT equal to the console. We can use the equality operator to determine if two pointers point to the same C-string but not to determine if strings contain the same text.
All relational operators operate on the addresses saved in C-strings, not their contents!
strcmp is an ordering function - given two strings, it imposes an order on them - specifying which one comes or orders first. The ordering is based on the ASCII collating sequence, which means that the characters are ordered based on their ASCII encoding (this ordering sequence is sometimes informally called asciibetical order). strcmp manages to provide for C-strings all of the functionality represented by the Java relational operators above.
strcmp: String-Compare
(a)
(b)
(c)
strcmp(s1, s2)
Comparing C-Strings. The strcmp function compares two C-strings character-by-character, from left to right, until reaching the end of one or both, or the compared characters are different. strcmp returns an integer value as follows:
Return Values
s1 comes before s2: returns -1
s1 and s2 are the same: returns 0
s1 comes after s2: returns 1
Examples
Both strings contain exactly the same characters, so strcmp returns a 0, which means that s1 and s2 are equal
The characters are different; 'W' < 'w', so strcmp returns a positive value (s2 comes before s1)
The strings are the same until s1 ends, so strcmp returns a negative value (the rule is that nothing comes before something); s1 comes before s2
strcmp return values. The strcmp function returns one of three possible values to inform a calling program which C-string comes first or that the two order the same.
Possible Implementations
int strcmp(const char* s1, const char* s2)
{
int i = 0;
while (s1[i] != '\0' && s2[i] != '\0' && s1[i] == s2[i])
i++;
if (s1[i] < s2[i])
return -1;
else if (s1[i] > s2[i])
return 1;
return 0;
}
int strcmp(const char* s1, const char* s2)
{
for (; *s1 && *s2 && *s1 == *s2; s1++, s2++)
;
if (*s1 < *s2)
return -1;
else if (*s1 > *s2)
return 1;
return 0;
}
(a)
(b)
Two implementations of strcmp. Older versions of the strcmp returned a signed magnitude: <0, 0, or >0 (e.g., -5 or 10), but the magnitude was typically irrelevant. This scheme made it possible to compare two characters by taking their difference, which is advantageous because subtraction is a fast, efficient operation. However, the need to support many languages with varying encodings requires a wider, 2-byte character, and it's not always possible to order two wide characters by taking their difference because of the many character encodings. So, newer strcmp versions return -1, 0, and 1 based on control structures rather than arithmetic.
The loops in both versions examine the strings' characters one at a time from left to right. The order of the sub-expressions in the test is significant. The tests for the strings' end (highlighted in yellow) occur before the equality test (in blue), and short circuit evaluation prevents the equality test from indexing the strings out of bounds. The loops end when they reach the end of either string or find two corresponding characters that don't match. The if-statement compares the two characters that caused the loop to end. If the strings are different lengths, then one of the characters is the null-terminator, which is the character equivalent of zero and less than any other character. The if-else ladder returns a value indicating the strings' ordering.
The first version uses array indexing notation, which might be clearer than the second version's pointer notation. We can drop the != '\0' test because C and C++ treat 0 (the null-terminator) as false (please see Boolean Type and Values).
The second version uses pointer operators and arithmetic. The dereference operator (e.g., *s1 and *s2) extracts a single character from each string. The auto increment operator (e.g., s1++ and s2++ advances the pointers to the next character in each string. Finally, the comma operator allows us to increment both pointers in the third for-loop expression.