"An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once." The process ignores spaces, punctuation characters, and character cases (upper or lower). For example, the letters in the phrase, "See the quick red fox jump over the lazy brown dog," can be rearranged to form the rather bland anagram "abcddeeeeeefghhijklmnoooopqrrrsttuuvwxyz." Cleaver anagrams are more challenging, forming a valid word or statement that is often a humorous comment on the first phrase. For example, an anagram for "Dormitory" is "Dirty Room." Some anagram aficionados have way too much time on their hands, as the following clever anagram illustrates:
The current section develops a program that determines if one phrase is an anagram of another, focusing on developing and representing a solution before programming it. It decomposes the overall problem into smaller subproblems and develops algorithms to solve each of them. The final program demonstrates strings, arrays, and functions operating on them.
Designing and implementing a program to verify two phrases form an anagram requires detail beyond the initial definition or general problem statement. In an authentic situation, software developers often create multiple views of a solution using various notations. Some views express an overall solution, others focus on sub-problems, and each may target a different audience or serve a different purpose. Similarly, depending on a program's complexity, software developers may represent it at various levels of abstraction or degrees of specificity. Higher levels of abstraction allow developers to focus on what the program does, while lower levels focus on how it does it.
Most audiences can follow a solution stated in prose (1(a)), but the complexity of natural languages often makes these solutions ambiguous and subject to various interpretations. Structuring the prose, such as with a hierarchical list, increases precision and narrows the interpretations.
Pseudocode is a description of the steps in an algorithm using a mix of conventions of programming languages (like assignment operator, conditional operator, loop) with informal, usually self-explanatory, notation of actions and conditions. Although pseudocode shares features with regular programming languages, it is intended for human reading rather than machine control. Pseudocode typically omits details that are essential for machine implementation of the algorithm, meaning that pseudocode can only be verified by hand.
Pseudocode is an intermediate abstract representation that is generally more precise than prose while remaining high enough to retain a focus on logic without becoming mired in detail. As such, pseudocode is an ideal notation for developing and conveying algorithms. The following examples use pseudocode to express the algorithms for solving the anagram problem. This approach allows developers to refine algorithms before spending time programming them.
Comparing the two candidate strings that potentially form an anagram is easier if they are in the same format. The first algorithm normalizes the strings. To normalize the strings means "to make [them] conform to or reduce [them] to a norm or standard" form. For the anagram program, a convenient normalized form eliminates spaces and punctuation, and converts the remaining letters to the same case - the algorithm arbitrarily chooses lowercase. If implemented as a function, the algorithm can process both candidate strings.
define the variable phrase and initialize it to empty
for each character, c, in the input
{
if c is an alphabetic letter
{
make c lowercase
append c to phrase
}
}
tobeornottobethatisthequestionwhetheritsnoblerinthemindtosuffertheslingsandarrowsofoutrageousfortune
inoneofthebardsbestthoughtoftragediesourinsistentherohamletqueriesontwofrontsabouthowlifeturnsrotten
Once the previous algorithm has removed the spaces and punctuation characters from the candidate phrases and converted all the letters to the same (lower) case, the next step is counting the occurrences of each letter: the number of a's, b's, ..., and the z's. Consequently, to test for anagrams in English, the algorithm requires 26 counters - one counter for each letter in the (English) alphabet.
define and initialize 26 counters:
a_count = 0, b_count = 0, ..., z_count = 0
for each letter, c, in a normalized string
{
if (c == 'a')
a_count++;
else if (c == 'b')
b_count++;
. . . .
else
z_count++;
} |
define an array: int count[26]
initialize each array element to 0
for each letter, c, in a normalized string
{
if (c == 'a')
count[0]++;
else if (c == 'b')
count[1]++;
. . . .
else
count[25]++;
} |
| (a) | (b) |
If both candidate phrases have the same number of each letter, they form an anagram. Therefore, comparing pairs of letter counts is sufficient for assessing the candidates. All twenty-six counts must be equal to assert that the candidates form an anagram. However, a single differing count indicates that they are not an anagram.
if (a_count1 != a_count2)
return false;
else if (b_count1 != b_count2)
return false;
. . .
else if(z_count1 != z_count2)
return false;
else
return true; |
if (a_count1 == a_count2 && b_count1 == b_count2 &&
. . . && z_count1 == z_count2)
cout return true
else
cout return false |
for (int i = 0; i < 26; i++)
if (count1[i] != count2[i])
return false;
return true; |
| (a) | (b) | (c) |
Discrete counters clearly match features of the original problem, making them easy to understand. However, each algorithm that uses them suffers from large, awkward programming structures whose very size allows programmers to introduce trivial but difficult-to-find errors. Replacing the discrete counters with two arrays, one for each set of counters, allows programmers to replace the if-ladders and long compound expressions with compact loops. However, the array-based algorithm used to generate the initial counts, shown in Figure 3(b), still relies on an extensive if-else ladder. To take full advantage of the array-based solution requires an easy way to map a letter to an appropriate array index. An efficient mapping allows programmers to collapse the if-else ladder to a single operation.
To implement the itoa function, the text used ASCII codes to convert an integer into digits by recognizing that the codes for the digits '0' through '9' are contiguous. The numeric value of a specific digit, d, added to the ASCII code for the digit '0,' is the ASCII code for the digit: d + '0' = ASCII(d). The ASCII codes for letters form two contiguous integer ranges: 'A' = 65 to 'Z' = 90, and 'a' = 97 to 'z' = 122. The normalizing algorithm (Figure 2) ensures that the counting algorithm only needs to handle lowercase letters.
| f('a') = 0 f('b') = 1 . . . f('z') = 25 |
f(c) = c - 'a' |
'a' - 'a' = 97 - 97 = 0 'b' - 'a' = 98 - 97 = 1 'c' - 'a' = 99 - 97 = 2 . . . 'z' - 'a' = 122 - 97 = 25 |
| (a) | (b) | (c) |
Reading the initial strings and printing the final message are now familiar operations presented with little detail. Program input and output take place in main as illustrated in the following figure:
int main()
{
string phrase1 = input(1); // (a)
string phrase2 = input(2); // (a)
if (phrase1.length() != phrase2.length()) // (b)
{
cout << "Phrases are NOT an anagram\n";
exit(1);
}
if (is_anagram(phrase1, phrase2)) // (c)
cout << "Phrases are an anagram\n"; // (d)
else
cout << "Phrases are NOT an anagram\n"; // (d)
return 0;
}
string input(int n)
{
string input; // (a)
cout << "Please enter phrase " << n << ": "; // (b)
getline(cin, input); // (c)
return normalize(input); // (d)
}
return operator returns input by value.
string normalize(const string& s) // (a)
{
string normalized; // (b)
for (size_t i = 0; i < s.length(); i++) // (c)
if (isalpha(s[i])) // (d)
normalized += tolower(s[i]); // (e)
return normalized;
}
void count(const string& s, int* counts) // (a)
{
for (size_t i = 0; i < s.length(); i++) // (b)
counts[s[i] - 'a']++; // (c)
}
bool is_anagram(const string& s1, const string& s2)
{
int count1[26] {}; // (a)
count(s1, count1); // (b)
int count2[26] {}; // (a)
count(s2, count2); // (b)
for (int i = 0; i < 26; i++) // (c)
if (count1[i] != count2[i]) // (d)
return false; // (e)
return true; // (f)
}
The text introduced for-range loops in Chapter 3, but has ignored them since. The neglect isn't due to a lack of usefulness but from the kind of data used throughout the first half of the text. For-range loops operate on data like string objects that are "iterable," meaning that they define the begin and end functions (see ipalnumber.cpp). For-range loops are especially useful when translating algorithms that use the phrase "for each."
for (variable-definition : range) { statements; }
The text described two algorithms, normalization and letter counting, Figures 2 and 3, respectively, saying for each letter, c, in .... This description suggests that the implementations of these algorithms may be amenable to using for-range loops.
string normalize(const string& s)
{
string normalized;
for (char c : s)
if (isalpha(c))
normalized += tolower(c);
return normalized;
}
|
void count(const string& s, int* counts)
{
for (char c : s)
counts[c - 'a']++;
}
|
| View | Download | Comments |
|---|---|---|
| anagram.cpp | anagram.cpp | string class version |
| canagram.cpp | canagram.cpp | C-string version |