The wc example relies on concepts introduced previously in this chapter. Please review the following as needed:
The Unix and Linux operating systems provide a utility named wc
(short for word count) that counts the words, lines, and characters in a list of files. The utility prints the counts for each file and the totals for all files in the list. We'll write a simplified version of the wc
utility to demonstrate a more complex switch statement. However, this program is uninteresting and difficult to test if it takes input only from the console. So, we'll jump ahead and see how to read input from a file. Like the original wc program, our version will also count words, lines, and characters. Unlike the original, our version will read data from only one file and will lack the command line options or switches altering the counts.
Our first task is to create a test case - simple data that we can use to verify that the program produces the correct counts. Creating a test case or even thinking about testing a program before writing it may seem odd. However, in organizations with separate development and quality assurance teams, it's common for the teams to work in parallel and cooperate. Focusing on the test case data helps guide us through the steps to solve the problem. The representation and the associated algorithmic steps developed from it will bridge the problem and the programmed solution.
See the\n quick red\n fox |
|
Test Case |
Typically, writing a program to solve a problem requires solving many sub-problems before programming can begin. Even after we simplify the program, it has five sub-problems that we must solve. The first sub-problem is defining what we mean by a "word." We must develop algorithms for the next three sub-problems: counting characters, lines, and words. The last sub-problem, reading data from a file, is a special case because we do not formally learn how to do this until much later in the textbook.
hello*@ 12#$
is counted as two words. The program will use an accumulator, words, to count the number of words in a file.chars
, to keep a count of the number of characters in the input file. The program will read data from the file one character at a time, which makes counting the characters very easy: read a character and increment the count by one.words
, and introduces a new problem that requires a new kind of variable. Words have at least one character, but they may be arbitrarily long. So, as the program reads each character in a word, it must be careful not to count the word more than once. The program will use a flag, in_word
, to "remember" if it is currently in a word (i.e., reading characters that are part of a word) or not. There are only two possibilities: either the program will be in a word or not. So, we can make the flag a variable of type bool
. The program uses all white-space characters as word separators, so we set the flag to false
whenever any of the white-space characters are detected and set it to true
whenever a non-white-space character is input. We will only count words when the program reads the first character of the word from the file.
In this way, the program counts every character, but it only counts words when it reads the first character of the word.
cin
and cout
to read input from and write output to the console. The console is just a file (or, more accurately, many files: one for input and several for output). cin
and cout
are objects instantiated from two stream classes: istream
and ostream
, which are declared in <iostream>
. Instances of ifstream
(short for input file stream) behave very much like instances of istream
but can also read from disk files:
<fstream>
is a header file that declares the classes needed for creating the objects that can read and write files.ifstream
class's constructors takes the name of a file as an argument and opens the file for reading.get
is a stream member function that reads one character from an input stream and returns it as an integer.EOF
(short for end of file) is a symbolic constant that represents a special value signaling that the program has read all data from a file (i.e., the read operation reaches the end of the file).#include <iostream> // for console I/O #include <fstream> // for ifstream #include <iomanip> // for setw using namespace std; int main() { ifstream file("fox.txt"); // (a) int chars = 0; // (b) int lines = 1; int words = 0; bool in_word = false; int c; while ((c = file.get()) != EOF) // (c) & (d) { chars++; // (e) switch (c) // (f) { case '\n': // (g) lines++; // fall through case ' ': // (h) case '\t': in_word = false; break; default: // (i) if (! in_word) // (j) { in_word = true; words++; } break; } } cout << setw(8) << lines << setw(8) << words << setw(8) << chars << endl; return 0; }
switch
, and that once execution begins, it continues until processing a break
or reaching the end of the switch statement.
file
that can read the contents of a file named "fox.txt"c
(the highlighted code)c
break
. Forgetting a break
is a common programming error, so we add a comment stating that the absence of a break
is a deliberate part of the program. In this way, the new-line character acts as both a line separator and a word separator (i.e., as white-space)false
default
case corresponds to c != WS
and processes all non-white-space characters.true
and count the wordOur next step is to create the file containing our test case. Contemporary operating systems (e.g., Windows, macOS, Unix, and Linux) use a tree-structured file system. Whenever a program runs, it does so in an environment provided by the host operating system. An essential part of that environment is a location in the file system - a directory or folder - called the current working directory or CWD. Based on how we have written the program, it can only open and read the test file if it is in the correct place in the file system - the CWD.
![]() |
![]() |
(a) | (b) |
Other IDEs may have different conventions than the ones described above. Furthermore, different text editors may also embed slightly different characters (beyond those of the test case) in the file, which will alter the results from what we expect. For example, the vi text editor and its derivatives always place an extra new-line character at the end of the entered text. We'll use Visual Studio to create the test file and run the program, ensuring that the test file is located correctly and does not place any unexpected characters in the text data.
Proceed to enter the test case data as it appears in Figure 1 into the newly created file. Be very careful as you enter the new-line characters: The \n
escape sequence in Figure 1 makes the single new-line characters visible to us while we design the test case; in place of \n
, press the ENTER key. Also, note that the last line does not end with a new line; leave the cursor following the last "x" and save the file. Now that the test case file is filled and positioned correctly, you may compile and run the program as you normally would.
3 5 21 |
See the\n quick red\n fox |
(a) | (b) |