Comparing or ordering two C-strings, especially testing for equality, can be confusing. Nevertheless, these operations are crucial for many text-processing programs. Having two string representations, each with its own comparison method, adds to the confusion. The C++ string class supports the full set of C++ relational operators, ==, !=, <, <=, >, and >=. In contrast, programs must use the strcmp function to compare C-strings, and its behavior, especially when testing for equality, is not straightforward.
string class vs. C-strings.
The C++ string class supports the same relational operators as the fundamental or primitive data types. However, C-strings do not support these operators, which is confusing because C-strings are a fundamental type.
When comparing instances of the string class, the relational operators evaluate to a boolean value: true or false.
The strcmp function is the only valid way to compare C-strings, and it returns a negative value, 0, or a positive value. The following figure explores the meaning of the returned value in detail. Columns (a) and (b) compare corresponding operations across the two string types.
The if-statements compile and run, but they fail to compare the textual data. C-strings are pointers - they store memory addresses. When applied to C-strings, all relational operators operate on the addresses, not on the referenced data.
strcmp is an ordering function - given two strings, it imposes an order on them - specifying which one comes or orders first. The ordering is based on the ASCII collating sequence, meaning that the characters are ordered based on their ASCII encodings (this ordering sequence is sometimes informally called asciibetical order).
strcmp: String-Compare
(a)
(b)
(c)
strcmp(s1, s2)
Comparing C-Strings.
The strcmp function compares two C-strings character-by-character, from left to right, until reaching the end of one or both, or the compared characters are different. strcmp returns an integer value as follows:
Return Values
s1 comes before s2: returns -1
s1 and s2 are the same: returns 0
s1 comes after s2: returns 1
Examples
Both strings contain exactly the same characters, so strcmp returns a 0, which means that s1 and s2 are equal
The characters are different; 'W' < 'w', so strcmp returns a positive value (s2 comes before s1)
The strings are the same until s1 ends, so strcmp returns a negative value (the rule is that nothing comes before something); s1 comes before s2
strcmp return values.
The strcmp function returns one of three possible values to inform a calling program which C-string comes first or whether the two are equal (i.e., have the same order).
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char* s1 = "HELLO WORLD";
char* s2 = "HELLO WORLD";
if (strcmp(s1, s2) == 0)
cout << "They are equal\n";
else
cout << "They are NOT equal\n";
if (! strcmp(s1, s2))
cout << "They are equal\n";
else
cout << "They are NOT equal\n";
return 0;
}
Output:
They are equal
They are equal
Equal strings.
C++ doesn't have a function specifically designed for testing two C-strings for equality, so programs use the strcmp function. Notice the NOT operator, !, in the second highlighted example.
Ordering strings with different lengths.
The strcmp function must handle the special case arising when one string is a prefix of the other. The two strings begin with the same characters, but one ends before a mismatch occurs. In this case, the shorter string orders before the longer one. The comparing rule is that nothing comes before something.
Possible strcmp Implementations
strcmp implementations
int strcmp(const char* s1, const char* s2)
{
int i = 0;
while (s1[i] != '\0' && s2[i] != '\0' && s1[i] == s2[i])
i++;
if (s1[i] < s2[i])
return -1;
else if (s1[i] > s2[i])
return 1;
return 0;
}
int strcmp(const char* s1, const char* s2)
{
for (; *s1 && *s2 && *s1 == *s2; s1++, s2++)
;
if (*s1 < *s2)
return -1;
else if (*s1 > *s2)
return 1;
return 0;
}
(a)
(b)
Two implementations of strcmp. Older versions of the strcmp returned a signed magnitude: <0, 0, or >0 (e.g., -5 or 10), but the magnitude was typically irrelevant. This scheme enabled comparison of two characters by taking their difference, which is advantageous because subtraction is a fast, efficient operation. However, supporting many languages with varying encodings requires a wider, 2-byte character, and it's not always possible to order two wide characters by taking their difference due to different character encodings. So, newer strcmp versions return -1, 0, and 1 based on control structures rather than arithmetic.
The loops examine the strings' characters one at a time, from left to right. The order of the sub-expressions in the test is significant. The tests for the strings' end (highlighted in yellow) occur before the equality test (in blue), and short circuit evaluation prevents the equality test from indexing the strings out of bounds. The loops end when they reach the end of either string or encounter two mismatched characters. The if-statement compares the two characters that caused the loop to end. If the strings have different lengths, then one of the characters is the null-terminator, which is the character equivalent of zero, and is less than any other character. The return statements return a value indicating the strings' ordering.
Character access uses the familiar array indexing notation. The != '\0' test is unnecessary because C++ treats the null-terminator, 0, as false (please see Boolean Type and Values).
Character access uses pointer operators and arithmetic. The dereference operator (e.g., *s1 and *s2) extract a character from each string. The auto increment operator (e.g., s1++ and s2++) advances the pointers to the next character, altering the local variables s1 and s2. Finally, the comma operator combines the increment operations into a single expression.