8.2.2.4. strcmp

Time: 00:02:37 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)

Comparing or ordering two C-strings, especially for equality, can be confusing. Nevertheless, these operations are crucial for many text-processing programs. I believe there are three main reasons for the confusion. First, C++ has two kinds of strings, each with a different way of comparing or ordering them. The C++ string class supports the full set of C++ relational operators, ==, !=, <, <=, >, and >=, but C-strings do not. Second, we must use the strcmp function to compare C-strings, and its behavior can be hard to understand. Finally, neither C nor C++ has an "equals" function or operator for comparing C-strings for equality.

String Class Relationship C-String strcmp Functions A Common Error
s1 == s2
 
strcmp(s1, s2) == 0
!strcmp(s1, s2)
char* s1 = "hello";
char* s2 = "hello";

if (s1 == s2)
	cout << "Equal\n";
else
	cout << "NOT equal\n";
s1 != s2
 
strcmp(s1, s2) != 0
strcmp(s1, s2)
s1 < s2
strcmp(s1, s2) < 0
s1 <= s2
strcmp(s1, s2) <= 0
s1 > s2
strcmp(s1, s2) > 0
s1 >= s2
strcmp(s1, s2) >= 0
(a)(b)(c)
string class vs. C-strings. The C++ string class supports the same relational operators as the fundamental or primitive data types. However, C-strings do not support these operators, which is confusing because C-strings are a fundamental type.
  1. When comparing instances of the string class, the relational operators evaluate to a boolean value: true or false.
  2. The strcmp function is the only valid way to compare C-strings, and it returns a negative value, 0, or a positive value. We will explore the meaning of the returned value in greater detail shortly. The strcmp function calls in column (b) correspond to the operations on the same row in column (a).
  3. The equality operator compares the addresses saved in s1 and s2, not the text "hello" to which they point. So, the if-statement prints NOT equal to the console. We can use the equality operator to determine if two pointers point to the same C-string but not to determine if strings contain the same text.
All relational operators operate on the addresses saved in C-strings, not their contents!

strcmp is an ordering function - given two strings, it imposes an order on them - specifying which one comes or orders first. The ordering is based on the ASCII collating sequence, which means that the characters are ordered based on their ASCII encoding (this ordering sequence is sometimes informally called asciibetical order). strcmp manages to provide for C-strings all of the functionality represented by the Java relational operators above.

strcmp: String-Compare

Two C-strings, s1 and s2, both contain 'HELLO WORLD\0'. The <kbd>strcmp</kbd> function begins with the left-most characters, s1[0] and s2[0], and compares them. They are the same character in the same case, so the comparison advances to the next pair of characters: s1[1] and s2[1]. The strcmp function continues to compare pairs of characters as the index values increase. The two C-strings are identical in this example, so strcmp does not find a mismatched pair.
(a)
Two C-strings, s1 contains 'HELLO world\0' and s2 contains 'HELLO WORLD\0'. The strcmp function begins with the left-most characters with the lowest index values, s1[0] and s2[0], and compares them. They are the same character, so the comparison advances to the next pair of characters: s1[1] and s2[1]. The strcmp function continues to compare pairs of characters until it reaches index 6: s1[6] is 'w' and s2[6] is 'W' - the first character is in lower case while the second is in upper case - so the characters do not match, and the function ends.
(b)
Two C-strings, s1 contains 'HELLO\0' and s2 contains 'HELLO WORLD\0'. The strcmp function begins with the left-most characters s1[0] and s2[0] and compares them. They are the same character, so the comparison advances to the next pair of characters: s1[1] and s2[1]. The strcmp function continues to compare pairs of characters until it reaches index 5, where s1 ends. Or, think of the situation as a character mismatch: s1[5] is '\0' and s2[5] is ' '. Either way, the function ends.
(c)

strcmp(s1, s2)

Comparing C-Strings. The strcmp function compares two C-strings character-by-character, from left to right, until reaching the end of one or both, or the compared characters are different. strcmp returns an integer value as follows:

Return Values

Examples
  1. Both strings contain exactly the same characters, so strcmp returns a 0, which means that s1 and s2 are equal
  2. The characters are different; 'W' < 'w', so strcmp returns a positive value (s2 comes before s1)
  3. The strings are the same until s1 ends, so strcmp returns a negative value (the rule is that nothing comes before something); s1 comes before s2

Header File:
#define <cstring>
Standard Prototype:
int strcmp(const char* s1, const char* s2);
Please see strcmp for more information
Microsoft Prototype:
N/A

Examples

strcmp Function Calls Return Values
strcmp("apple", "zebra")
-1
strcmp("zebra", "apple")
1
strcmp("apple", "apple")
0
strcmp("APPLE", "apple")
-1
strcmp("apple", "APPLE")
1
strcmp("app", "apple")
-1
strcmp("apple", "app")
1
strcmp return values. The strcmp function returns one of three possible values to inform a calling program which C-string comes first or that the two order the same.
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
using namespace std;

int main()
{
	char* s1 = "HELLO WORLD";
	char* s2 = "HELLO WORLD";

	if (strcmp(s1, s2) == 0)
		cout << "They are equal\n";
	else
		cout << "They are NOT equal\n";

	if (!strcmp(s1, s2))
		cout << "They are equal\n";
	else
		cout << "They are NOT equal\n";

	return 0;
}

Output:

They are equal
They are equal
Equal strings. C++ doesn't have a function specifically designed for testing two C-strings for equality, so we must use the strcmp function. Notice the NOT operator, !, in the second highlighted example.
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
using namespace std;

int main()
{
	char* s1 = "HELLO world";
	char* s2 = "HELLO WORLD";

	if (strcmp(s1, s2) <= 0)
		cout << "s1, s2\n";
	else
		cout << "s2, s1\n";

	return 0;
}

Output:

s2, s1
Ordering Strings with different content. Ordering C-strings, as illustrated here, is one step in text-sorting algorithms.
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
using namespace std;

int main()
{
	char* s1 = "HELLO";
	char* s2 = "HELLO WORLD";

	if (strcmp(s1, s2) < 0)
		cout << "s1, s2\n";
	else
		cout << "s2, s1\n";

	return 0;
}

Output:

s1, s2
Ordering strings with different lengths. Remember the rule that nothing comes before something. The rule means that when two strings are the same to the point where one ends, the shorter one comes before the longer one.

Possible Implementations

int strcmp(const char* s1, const char* s2)
{
    int i = 0;
    while (s1[i] != '\0' && s2[i] != '\0' && s1[i] == s2[i])
        i++;

    if (s1[i] < s2[i])
        return -1;
    else if (s1[i] > s2[i])
        return 1;

    return 0;
}
int strcmp(const char* s1, const char* s2)
{
    for (; *s1 && *s2 && *s1 == *s2; s1++, s2++)
        ;

    if (*s1 < *s2)
        return -1;
    else if (*s1 > *s2)
        return 1;

    return 0;
}
 
(a)(b)
Two implementations of strcmp. Older versions of the strcmp returned a signed magnitude: <0, 0, or >0 (e.g., -5 or 10), but the magnitude was typically irrelevant. This scheme made it possible to compare two characters by taking their difference, which is advantageous because subtraction is a fast, efficient operation. However, the need to support many languages with varying encodings requires a wider, 2-byte character, and it's not always possible to order two wide characters by taking their difference because of the many character encodings. So, newer strcmp versions return -1, 0, and 1 based on control structures rather than arithmetic.

The loops in both versions examine the strings' characters one at a time from left to right. The order of the sub-expressions in the test is significant. The tests for the strings' end (highlighted in yellow) occur before the equality test (in blue), and short circuit evaluation prevents the equality test from indexing the strings out of bounds. The loops end when they reach the end of either string or find two corresponding characters that don't match. The if-statement compares the two characters that caused the loop to end. If the strings are different lengths, then one of the characters is the null-terminator, which is the character equivalent of zero and less than any other character. The if-else ladder returns a value indicating the strings' ordering.

  1. The first version uses array indexing notation, which might be clearer than the second version's pointer notation. We can drop the != '\0' test because C and C++ treat 0 (the null-terminator) as false (please see Boolean Type and Values).
  2. The second version uses pointer operators and arithmetic. The dereference operator (e.g., *s1 and *s2) extracts a single character from each string. The auto increment operator (e.g., s1++ and s2++ advances the pointers to the next character in each string. Finally, the comma operator allows us to increment both pointers in the third for-loop expression.