To process something means "to subject [it] to or handle [it] through an established usually routine set of procedures." When we say that we are going to "process a file," we more accurately mean we are going to "process the data in the file." So, when we process a file, we either subject the data that it contains to a set of procedures (i.e., we manipulate the data algorithmically), we store the results of a set of procedures or algorithmic operations in a file or some combination of both. When we prepare to process a file, there are two independent concepts that we must consider. The first, called the access technique, is how we locate the position in the file where the read or write will take place. The second is how we read or write the file - what function(s) we use, which determines how much data is transferred between the file and the program by each I/O operation.
Systems broadly provide at least two primary ways for programmers to access files. We can also synthesize a third technique using fundamental file processing operations. The problem the program solves dictates the access technique we follow. The access technique then dictates which is the best stream class to use.
position pointer += n;
. Instances of the ifstream
and ofstream
classes perform sequential access.fstream
. In this context, "random" means that a program can access the data in any order, not just sequentially. And "direct" means that a program can access a specific data item using an index or record number - very much like an array. Direct access uses block I/O operations (introduced below) and adds a family of overloaded "seek" functions. See Random/Direct Access later in this chapter for details, illustrations, and an example.Programs, especially those using sequential access, often use loops to process files. When the program writes data to a file, some condition related to the data source signals the program when there is no more data to write. When the program reads information from a file, the file itself (or, more accurately, the operating system) must signal the program when it has read all the data. In the last section, we learned that each stream object maintains a set of four one-bit flags that indicate the stream's current state or condition. One of those flags, eofbit, signals when the position pointer reaches the end of the file. C++ streams provide two ways for a program to detect the end of a file.
The eof
or end of file function returns true
after the position pointer reaches the end of file. It's easy to base loop statements that read and process the file contents on this function.
Unfortunately, the behavior of the eof
function is not as straightforward as we might expect. The function does not actually test the file to see if there is more data to read. Instead, it returns the current value of the eofbit, which is set by the last read function called. So, the value eof
returns depends on the outcome of a different, previous function call. This unexpected behavior is usually only a problem when reading single characters from a file - functions reading more complex data detect and set the eofbit as a part of the read operation.
#include <iostream> #include <fstream> using namespace std; int main() { ifstream in("data.txt"); char c; while (!in.eof()) { in >> c; //in.get(c); cout << '|' << c << '|' << endl; } return 0; } |
#include <iostream> #include <fstream> using namespace std; int main() { ifstream in("data.txt"); while (!in.eof()) { int c = in.get(); cout << '|' << (char)c << '|' << endl; } return 0; } |
|a| |b| |c| |d| |d| |
|a| |b| |c| |d| | | |
(a) | (b) |
eof
function used in conjunction with three different read operations. The test data consists of a file with four characters on one line: abcd
. The while-loops in both programs loop five times - one time too many. The output that each program produces is displayed below the programs. The '|' character is included as part of the output to make the space in the output of program (b) "visible."
in >> c
and in.get(c)
. The program exhibits the same flawed behavior regardless of which read function is used: the eofbit is set by the read function only after the loop begins its fifth and final iteration. During the final iteration of the loop, the read function attempts to read data, fails, and then sets the eofbit - the "processing" code (represented by the output operation) uses the character read by the previous iteration of the loop.get
function, returns the character as an int, which must be cast to a char for output. The last read, which takes place during the final iteration, after the eof
function is called, returns EOF, which produces an unprintable character when cast.The overloaded bool operator()
function is a conversion operator that is used in conjunction with input streams and some of their member functions to detect when the end of the file is reached. It's important to understand that programmers do not explicitly call the conversion function. Like a constructor, the conversion function is called automatically when the context requires a Boolean value where an input stream is used. The following code fragments demonstrate how to use the function.
ifstream input(file_name); |
ifstream& get(char& c); |
(a) | (b) |
do { ... } while (input); |
char c; while (input.get(c)) . . . |
(c) | (d) |
get
function illustrating that the function returns an ifstream reference.get
function reads one character from the input file, stores it in the variable c, and returns a reference to input, which automatically calls the conversion operator. The conversion operator creates a Boolean value that drives the loop.Once we have selected an access technique that matches the problem we are trying to solve, the next step is to determine the best way to read or write the file. Different I/O operations allow the program to read or write different amounts of data with each I/O operation. Like access techniques, these I/O operations must also match the problem that the program solves.
Three ways of reading or writing data are generally supported, with a fourth, very specialized way also provided. The three common techniques of processing files match the I/O operations to the natural data boundaries - that is, they read or write the number of bytes to form a complete data item: one byte for a character, four bytes for an integer, eight bytes for a double, etc. The fourth way accesses data along hardware-oriented boundaries, which dramatically limits further data processing.
>>
and <<
can be used to read and write data respectively>>
and <<
can be used to read and write data respectivelyfilebuf
streambuf
object by aggregation (Figure 2). The rdbuf
function gets and sets the buffer, which allows rapidly processing data in limited situations.
The example that follows, mycopy.cpp, demonstrates some simple file processing operations.