8.2.2.4. strcmp

Time: 00:02:37 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides: PDF, PPTX

Comparing or ordering two C-strings, especially testing for equality, can be confusing. Nevertheless, these operations are crucial for many text-processing programs. Having two string representations, each with its own comparison method, adds to the confusion. The C++ string class supports the full set of C++ relational operators, ==, !=, <, <=, >, and >=. In contrast, programs must use the strcmp function to compare C-strings, and its behavior, especially when testing for equality, is not straightforward.

C++ string Operators C-String strcmp Functions A Common Error
s1 == s2
 
strcmp(s1, s2) == 0
!strcmp(s1, s2)
char s1[100] = "apple";
const char* s2 = "apple";
const char* s3 = "zebra";

if (s1 == s2)
	cout << "Equal\n";
else
	cout << "NOT equal\n";

if (s1 < s3)
	cout << "Before\n";
else
	cout << "After\n";

if (s1 > s3)
	cout << "After\n";
else
	cout << "Before\n";
s1 != s2
 
strcmp(s1, s2) != 0
strcmp(s1, s2)
s1 < s2
strcmp(s1, s2) < 0
s1 <= s2
strcmp(s1, s2) <= 0
s1 > s2
strcmp(s1, s2) > 0
s1 >= s2
strcmp(s1, s2) >= 0
(a)(b)(c)
string class vs. C-strings. The C++ string class supports the same relational operators as the fundamental or primitive data types. However, C-strings do not support these operators, which is confusing because C-strings are a fundamental type.
  1. When comparing instances of the string class, the relational operators evaluate to a boolean value: true or false.
  2. The strcmp function is the only valid way to compare C-strings, and it returns a negative value, 0, or a positive value. The following figure explores the meaning of the returned value in detail. Columns (a) and (b) compare corresponding operations across the two string types.
  3. The if-statements compile and run, but they fail to compare the textual data. C-strings are pointers - they store memory addresses. When applied to C-strings, all relational operators operate on the addresses, not on the referenced data.

strcmp is an ordering function - given two strings, it imposes an order on them - specifying which one comes or orders first. The ordering is based on the ASCII collating sequence, meaning that the characters are ordered based on their ASCII encodings (this ordering sequence is sometimes informally called asciibetical order).

strcmp: String-Compare

Two C-strings, s1 and s2, both contain 'HELLO WORLD\0'. The <kbd>strcmp</kbd> function begins with the left-most characters, s1[0] and s2[0], and compares them. They are the same character in the same case, so the comparison advances to the next pair of characters: s1[1] and s2[1]. The strcmp function continues to compare pairs of characters as the index values increase. The two C-strings are identical in this example, so strcmp does not find a mismatched pair.
(a)
Two C-strings, s1 contains 'HELLO world\0' and s2 contains 'HELLO WORLD\0'. The strcmp function begins by comparing the left-most characters, s1[0] and s2[0]. They are the same character, so the comparison proceeds to the next pair: s1[1] and s2[1]. The strcmp function continues to compare pairs of characters until it reaches index 6: s1[6] is 'w' and s2[6] is 'W' - the first character is in lower case while the second is in upper case - so the characters do not match, and the function ends.
(b)
Two C-strings, s1 contains 'HELLO\0' and s2 contains 'HELLO WORLD\0'. The strcmp function begins by comparing the left-most characters, s1[0] and s2[0]. They are the same character, so the comparison proceeds to the next pair: s1[1] and s2[1]. The strcmp function continues to compare pairs of characters until it reaches index 5, where s1 ends. Or, think of the situation as a character mismatch: s1[5] is '\0' and s2[5] is a space. Either way, the function ends.
(c)

strcmp(s1, s2)

Comparing C-Strings. The strcmp function compares two C-strings character-by-character, from left to right, until reaching the end of one or both, or the compared characters are different. strcmp returns an integer value as follows:

Return Values

Examples
  1. Both strings contain exactly the same characters, so strcmp returns a 0, which means that s1 and s2 are equal
  2. The characters are different; 'W' < 'w', so strcmp returns a positive value (s2 comes before s1)
  3. The strings are the same until s1 ends, so strcmp returns a negative value (the rule is that nothing comes before something); s1 comes before s2
Header File:
#define <cstring>
Standard Prototype:
int strcmp(const char* s1, const char* s2);
Please see strcmp for more information
Microsoft Prototype:
N/A

strcmp Examples

strcmp Function Calls Return Values
strcmp("apple", "zebra")
-1
strcmp("zebra", "apple")
1
strcmp("apple", "apple")
0
strcmp("APPLE", "apple")
-1
strcmp("apple", "APPLE")
1
strcmp("app", "apple")
-1
strcmp("apple", "app")
1
strcmp return values. The strcmp function returns one of three possible values to inform a calling program which C-string comes first or whether the two are equal (i.e., have the same order).

 

#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
using namespace std;

int main()
{
	char* s1 = "HELLO WORLD";
	char* s2 = "HELLO WORLD";

	if (strcmp(s1, s2) == 0)
		cout << "They are equal\n";
	else
		cout << "They are NOT equal\n";

	if (! strcmp(s1, s2))
		cout << "They are equal\n";
	else
		cout << "They are NOT equal\n";

	return 0;
}

Output:

They are equal
They are equal
Equal strings. C++ doesn't have a function specifically designed for testing two C-strings for equality, so programs use the strcmp function. Notice the NOT operator, !, in the second highlighted example.

 

#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
using namespace std;

int main()
{
	const char* s1 = "HELLO world";
	const char* s2 = "HELLO WORLD";

	if (strcmp(s1, s2) <= 0)
		cout << "s1, s2\n";
	else
		cout << "s2, s1\n";

	return 0;
}

Output:

s2, s1
Ordering Strings with different content. Ordering C-strings, as illustrated here, is one step in text-sorting algorithms.

 

#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
using namespace std;

int main()
{
	char* s1 = "HELLO";
	char* s2 = "HELLO WORLD";

	if (strcmp(s1, s2) < 0)
		cout << "s1, s2\n";
	else
		cout << "s2, s1\n";

	return 0;
}

Output:

s1, s2
Ordering strings with different lengths. The strcmp function must handle the special case arising when one string is a prefix of the other. The two strings begin with the same characters, but one ends before a mismatch occurs. In this case, the shorter string orders before the longer one. The comparing rule is that nothing comes before something.

Possible strcmp Implementations

int strcmp(const char* s1, const char* s2)
{
    int i = 0;
    while (s1[i] != '\0' && s2[i] != '\0' && s1[i] == s2[i])
        i++;

    if (s1[i] < s2[i])
        return -1;
    else if (s1[i] > s2[i])
        return 1;

    return 0;
}
int strcmp(const char* s1, const char* s2)
{
    for (; *s1 && *s2 && *s1 == *s2; s1++, s2++)
        ;

    if (*s1 < *s2)
        return -1;
    else if (*s1 > *s2)
        return 1;

    return 0;
}
 
(a)(b)
Two implementations of strcmp. Older versions of the strcmp returned a signed magnitude: <0, 0, or >0 (e.g., -5 or 10), but the magnitude was typically irrelevant. This scheme enabled comparison of two characters by taking their difference, which is advantageous because subtraction is a fast, efficient operation. However, supporting many languages with varying encodings requires a wider, 2-byte character, and it's not always possible to order two wide characters by taking their difference due to different character encodings. So, newer strcmp versions return -1, 0, and 1 based on control structures rather than arithmetic.

The loops examine the strings' characters one at a time, from left to right. The order of the sub-expressions in the test is significant. The tests for the strings' end (highlighted in yellow) occur before the equality test (in blue), and short circuit evaluation prevents the equality test from indexing the strings out of bounds. The loops end when they reach the end of either string or encounter two mismatched characters. The if-statement compares the two characters that caused the loop to end. If the strings have different lengths, then one of the characters is the null-terminator, which is the character equivalent of zero, and is less than any other character. The return statements return a value indicating the strings' ordering.

  1. Character access uses the familiar array indexing notation. The != '\0' test is unnecessary because C++ treats the null-terminator, 0, as false (please see Boolean Type and Values).
  2. Character access uses pointer operators and arithmetic. The dereference operator (e.g., *s1 and *s2) extract a character from each string. The auto increment operator (e.g., s1++ and s2++) advances the pointers to the next character, altering the local variables s1 and s2. Finally, the comma operator combines the increment operations into a single expression.