At their most fundamental level, computers operate on integers, including textual or character-based operations. Computers encode - store and manipulate - characters as integers of various sizes. The ASCII character set was an early, 7-bit encoding appropriate for the limited capabilities of non-graphic hardware such as teletype and CRT displays (see The Computer Console). However, the ASCII encoding is limited and unable to support the numerous alphabets used today. Modern, graphic-capable computer systems now use the Unicode encoding, based on a 16-bit integer and capable of representing many alphabets and emojis.
Programs performing textual operations often must classify or categorize characters based on a common characteristic. For example, a program may validate numerical input by verifying that each character read represents a digit. Both character encodings represent digit characters as integer sequences, allowing programmers to write simple tests. Using the Hindu-Arabic numerals for simplicity and assuming that the character is stored in the variable c, we can write the test as follows:
if (c >= '0' && c <= '9') ...
Including hexadecimal digits adds two more ranges to the if-statement:
if (c >= '0' && c <= '9' || c >= 'A' && c <= 'F' || c >= 'a' && c <= 'f') ...
Testing for punctuation characters is still more cumbersome because they are scattered throughout the encodings.
While we can always use if-statements and switches to classify characters, these operations occur so often that C++ provides a library of optimized classification functions. The library implements the functions with table lookups and bitwise operations written in hand-crafted assembly code, making them very fast, and it's unlikely we can outperform them with if-statements or switches.
| #include <cctype> | Header file required to use the cctype library |
| int isalnum(int c) | Returns true if the character c is an alphanumeric (isalpha(c) && isdigit(c)), else return false |
| int isalpha(int c) | Returns true if the character c is an alphabetic letter, else return false; what characters the function considers to be alphabetic letters depends on the default locale: isalpha(c) is true when isupper(c) && islower(c) are true |
| int isascii(int c) | Returns true if the character c is in the range 0 to 0x7F (deprecated) |
| int isblank(int c) | Returns true if the the character c is blank, else return false |
| int iscntrl(int c) | Returns true if the character c is a control character, else return false |
| int isdigit(int c) | Returns true if the character c is a decimal digit, else return false; isdigit(c) returns true whenever c is in the range '0' to '9' |
| int isgraph(int c) | Returns true if the character c has a graphical representation, else return false |
| int islower(int c) | Returns true if the character c is a lowercase letter, else return false; which characters the function considers as letters depends on the default locale: in U.S. English, islower(c) returns true whenever c is in the range 'a' to 'z' |
| int isprint(int c) | Returns true if the character c is a printable, else return false |
| int ispunct(int c) | Returns true if the character c is a punctuation character, else return false |
| int isspace(int c) | Returns true if the character c is a white-space, else return false |
| int isupper(int c) | Returns true if the character c is an uppercase letter, else return false; which characters the function considers as letters depends on the default locale: in U.S. English, islower(c) returns true whenever c is in the range 'A' to 'Z' |
| int isxdigit(int c) | Returns true if the character c is a hexadecimal digit, else return false; isxdigit(c) returns true whenever c is in one of the ranges '0' to '9', 'A' to 'F', or 'a' to 'f' |
| int tolower(int c) | Convert uppercase letter to lowercase, else return false; does not change the character if it is not an uppercase letter |
| int toupper(int c) | Convert lowercase letter to uppercase, else return false; does not change the character if it is not a lowercase letter |
#include <iostream>
#include <cctype>
using namespace std;
int main()
{
for (char c : "Hello, World!")
if (isalpha(c))
cout << c;
cout << endl;
return 0;
} |
#include <iostream>
#include <cctype>
using namespace std;
int main()
{
for (char c : "Hello, World!")
cout << char(tolower(c));
// or cout << (char)tolower(c);
cout << endl;
return 0;
} |
HelloWorld |
hello, world! |
| (a) | (b) |