14.7. Character I/O

It is often useful to process (read and write) the contents of a file one character or one byte at a time. The three I/O functions demonstrated in the mycopy.cpp example are sufficient for completing these tasks. Furthermore, these functions work with both textual and binary data. Four additional "support" functions, also illustrated in mycopy, provide everything else needed for the task. Three of the functions, eof, good, and operator(), were described in detail previously. While operator>> and operator<< can be used to read and write textual data, they should be avoided when processing binary data because their behavior varies between systems.

FunctionDescriptionStream
int get();
Reads one character from the input stream and returns it as an integer. C++ converts between integers and characters without requiring a type cast. ifstream
istream& get(char& c);
Reads one character from the input stream, which is passed back through the argument by reference. ifstream
ostream& put(char c);
Writes one character or byte to the output stream. ofstream
streamsize gcount();
The "g" in the function name is short for "get," which is used as a synonym for read or input. The function returns a count of how many unformatted characters were read with the last input operation. Unformatted characters include the full line-separator sequences (for operating systems that use multi-character line separators). While this function can be used with character input, it is most useful with block reads. The streamsize return type is a portable type appropriate for storing a size value. ifstream
Character I/O functions. The most useful character I/O functions. Notice that the two overloaded versions of get (the first two functions) have different arguments and different return types. For greater detail and additional functions, please see:

Common Programming Patterns

ifstream input("input.txt");
ofstream output("output.txt);
(a)
One-read Loops
int c;
while ((c = input.get()) != EOF)
{
	// process c
	output.put(c);
}
char c;
while (input.get(c))
{
	// process c
	output.put(c);
}
(b)(c)
Two-read Loops
int c = input.get();
do
{
	// process the data
	c = input.get();
} while (! input.eof());
int c = input.get();
do
{
	// process the data
	c = input.get();
} while (input);
(d)(e)
int c = input.get();
while (input.gcount() > 0)
{
	// process c
	output.put(c);
	c = input.get();
}
(f)
Character I/O examples. The simple code fragments illustrate the I/O function syntax and suggest different ways to read or write a file one character at a time. Depending on the problem the program solves, choose the best pattern or part of the best pattern to complete the program.
  1. Defining the input and output stream objects. Replace the file names with files appropriate to a given problem or with variables.
  2. This overloaded version of the get function reads one character from the input stream and returns it as the function return value. This example embeds the get function call in the while-loop control, which provides a compact and efficient mechanism that reads all data in a file. It's important to notice the grouping of the parentheses: The outer group is part of the while-loop syntax. The middle group, in red, causes the read and assignment to take place first, followed by the test for equality. The symbolic constant EOF or end-of-file is returned when there are no more data in the file. The innermost parentheses are the empty parameter list for the get function.
  3. This overloaded version of the get function reads a single character from the input stream and passes it back through the parameter. The overloaded operator() is instrumental in providing the Boolean value needed to drive the loop.
  4. The eof function returns true when the read operations have reached the end of the file. Requires two reads because the eof-bit is not set until a read operation detects the end of the file.
  5. Like pattern (c), this version also relies on the overloaded operator() for the Boolean value needed to continue or end the loop. The pattern requires two reads because the eofbit is not set until a read operation detects the end of the file.
  6. A while-loop based on the gcount function. This version requires two read operations. The first read, which takes place outside and above the loop, attempts to read the first character from the file; if a character was read, gcount returns 1 and the loop runs. The read inside and at the bottom of the loop attempts to read the next character; while data remains in the file, gcount returns a value greater than 0 and the loop continues.

Binary Data

The code fragment presented in Figure 2(b) raises the last question about character I/O that we must answer. The get function reads and returns a single character or byte from a file. In the code fragment, the byte is first stored in a variable and then compared to the value represented by the symbolic constant EOF, typically -1. This code pattern was one of two used in the mycopy example to read the contents of a binary file. If a binary file can contain any value, positive or negative, why doesn't the loop end when the program reads a data value of -1 from the file?

The answer lies in how computers store negative values in main memory and disk files. Each byte of data consists of 8-bits, which means there are 28 = 256 possible patterns of 1s and 0s. Two keywords, signed (able to store 0, positive, and negative values) and unsigned (only able to store 0 and positive values) modify how the bit-patterns are interpreted. C++ treats most integers as signed by default, but it is left to the underlying hardware to determine if characters are signed or unsigned by default (typically signed on contemporary hardware).

Since we can't store a "-" sign in a byte, some bit patterns must be interpreted as negative values. The left-most bit, called the sign bit, indicates the sign of a signed value (0 = positive and 1 = negative), but it is included with the magnitude of an unsigned value. The key to understanding the behavior of the get function is appreciating that digital computers store all machine instructions and data as 1s and 0s, which are meaningless until they are grouped together and interpreted. In the case of data, the same bit pattern may be interpreted in more than one way. That is, data can be written as a signed value but read as an unsigned value.

Binary Hexadecimal Decimal ≥ 0 Decimal < 0
0000000000 
0000000111 
0000001022 
0000010044 
0000100088 
00001111F15 
000100001016 
000111111F31 
001000002032 
001100003048 
001111113F63 
010000004064 
010100005080 
011000006096 
0111000070112 
011111117F127 
1000000080128-128
100011118F143-113
1001000090144-112
10100000A0160-96
10110000B0176-80
11000000C0192-64
11010000D0208-48
11100000E0224-32
11110000F0240-16
11111111FF255-1
Reading and writing selected bit-patterns. It's neither feasible nor useful to list all 256 possible bit-patterns here. So, a few patterns are selected to highlight how numbers are represented in a digital computer. (Please note that the computer's natural endianness also affects how bit-patterns are viewed and interpreted.) When the get function reads a byte with the sign bit set (highlighted in red), it returns the value as an unsigned integer ≥ 128. This behavior prevents a value of -1 stored in the file from being inadvertently interpreted as EOF.