3.9. Character Classification and Conversion Functions

At their most fundamental level, computers operate on integers, including textual or character-based operations. Computers encode - store and manipulate - characters as integers of various sizes. The ASCII character set was an early, 7-bit encoding appropriate for the limited capabilities of non-graphic hardware such as teletype and CRT displays (see The Computer Console). However, the ASCII encoding is limited and unable to support the numerous alphabets used today. Modern, graphic-capable computer systems now use the Unicode encoding, based on a 16-bit integer and capable of representing many alphabets and emojis.

Programs performing textual operations often must classify or categorize characters based on a common characteristic. For example, a program may validate numerical input by verifying that each character read represents a digit. Both character encodings represent digit characters as integer sequences, allowing programmers to write simple tests. Using the Hindu-Arabic numerals for simplicity and assuming that the character is stored in the variable c, we can write the test as follows:

if (c >= '0' && c <= '9') ...
Including hexadecimal digits adds two more ranges to the if-statement:
if (c >= '0' && c <= '9' || c >= 'A' && c <= 'F' || c >= 'a' && c <= 'f') ...
Testing for punctuation characters is still more cumbersome because they are scattered throughout the encodings.

While we can always use if-statements and switches to classify characters, these operations occur so often that C++ provides a library of optimized classification functions. The library implements the functions with table lookups and bitwise operations written in hand-crafted assembly code, making them very fast, and it's unlikely we can outperform them with if-statements or switches.

CCType Library Functions

#include <cctype> Header file required to use the cctype library
int isalnum(int c) Returns true if the character c is an alphanumeric (isalpha(c) && isdigit(c)), else return false
int isalpha(int c) Returns true if the character c is an alphabetic letter, else return false; what characters the function considers to be alphabetic letters depends on the default locale: isalpha(c) is true when isupper(c) && islower(c) is true
int isascii(int c) Returns true if the character c is in the range 0 to 0x7F (deprecated)
int isblank(int c) Returns true if the the character c is blank, else return false
int iscntrl(int c) Returns true if the character c is a control character, else return false
int isdigit(int c) Returns true if the character c is a decimal digit, else return false; isdigit(c) returns true whenever c is in the range '0' to '9'
int isgraph(int c) Returns true if the character c has a graphical representation, else return false
int islower(int c) Returns true if the character c is a lowercase letter, else return false; which characters the function considers as letters depends on the default locale: in U.S. English, islower(c) returns true whenever c is in the range 'a' to 'z'
int isprint(int c) Returns true if the character c is a printable, else return false
int ispunct(int c) Returns true if the character c is a punctuation character, else return false
int isspace(int c) Returns true if the character c is a white-space, else return false
int isupper(int c) Returns true if the character c is an uppercase letter, else return false; which characters the function considers as letters depends on the default locale: in U.S. English, islower(c) returns true whenever c is in the range 'A' to 'Z'
int isxdigit(int c) Returns true if the character c is a hexadecimal digit, else return false; isxdigit(c) returns true whenever c is in one of the ranges '0' to '9', 'A' to 'F', or 'a' to 'f'
int tolower(int c) Convert uppercase letter to lowercase, else return false; does not change the character if it is not an uppercase letter
int toupper(int c) Convert lowercase letter to uppercase, else return false; does not change the character if it is not a lowercase letter
cctype: the character classification functions. Although the cctype functions deal with characters, all function arguments and return values are type int. C++ automatically converts between int and char without an explicit type cast. Returning an int reflects the library's origin in C, which did not provide a Boolean data type. The functions, excepting the last two, interpret the returned value as a Boolean: 0 represents false and non-0 represents true.

CCType Examples

#include <iostream>
#include <cctype>
using namespace std;

int main()
{
	for (char c : "Hello, World!")
		if (isalpha(c))
			cout << c;
	cout << endl;

	return 0;
}
#include <iostream>
#include <cctype>
using namespace std;

int main()
{
	for (char c : "Hello, World!")
		cout << char(tolower(c));
		// or cout << (char)tolower(c);
	cout << endl;

	return 0;
}
HelloWorld hello, world!
(a)(b)
cctype function examples.
  1. Only prints alphabetic characters. Strips out spaces, punctuation, digits, and control characters.
  2. Converts uppercase characters to lowercase. Non-alphabetic characters are "passed through" unchanged.