14.7. Character I/O

It is often useful to process (read and write) the contents of a file one character or one byte at a time. The three I/O functions demonstrated in the mycopy.cpp example are sufficient for completing these tasks. Furthermore, these functions work with both textual and binary data. Four additional "support" functions, also illustrated in mycopy, provide everything else needed for the task. Three of the functions, eof, good, and operator(), were described in detail previously. While operator>> and operator<< can be used to read and write textual data, they should be avoided when processing binary data because their behavior varies between systems.

Function	Description	Stream
int get();	Reads one character from the input stream and returns it as an integer. C++ converts between integers and characters without requiring a type cast.	`ifstream`
istream& get(char& c);	Reads one character from the input stream, which is passed back through the argument by reference.	`ifstream`
ostream& put(char c);	Writes one character or byte to the output stream.	`ofstream`
streamsize gcount();	The "g" in the function name is short for "get," which is used as a synonym for read or input. The function returns a count of how many unformatted characters were read with the last input operation. Unformatted characters include the full line-separator sequences (for operating systems that use multi-character line separators). While this function can be used with character input, it is most useful with block reads. The `streamsize` return type is a portable type appropriate for storing a size value.	`ifstream`

Character I/O functions. The most useful character I/O functions. Notice that the two overloaded versions of get (the first two functions) have different arguments and different return types. For greater detail and additional functions, please see:

Common Programming Patterns

One-read Loops
ifstream input("input.txt"); ofstream output("output.txt);
(a)
int c; while ((c = input.get()) != EOF) { // process c output.put(c); }	char c; while (input.get(c)) { // process c output.put(c); }
(b)	(c)
Two-read Loops
int c = input.get(); do { // process the data c = input.get(); } while (! input.eof());	int c = input.get(); do { // process the data c = input.get(); } while (input);
(d)	(e)
int c = input.get(); while (input.gcount() > 0) { // process c output.put(c); c = input.get(); }
(f)

Character I/O examples. The simple code fragments illustrate the I/O function syntax and suggest different ways to read or write a file one character at a time. Depending on the problem the program solves, choose the best pattern or part of the best pattern to complete the program.

Defining the input and output stream objects. Replace the file names with files appropriate to a given problem or with variables.
This overloaded version of the get function reads one character from the input stream and returns it as the function return value. This example embeds the get function call in the while-loop control, which provides a compact and efficient mechanism that reads all data in a file. It's important to notice the grouping of the parentheses: The outer group is part of the while-loop syntax. The middle group, in red, causes the read and assignment to take place first, followed by the test for equality. The symbolic constant EOF or end-of-file is returned when there are no more data in the file. The innermost parentheses are the empty parameter list for the get function.
This overloaded version of the get function reads a single character from the input stream and passes it back through the parameter. The overloaded operator() is instrumental in providing the Boolean value needed to drive the loop.
The eof function returns true when the read operations have reached the end of the file. Requires two reads because the eof-bit is not set until a read operation detects the end of the file.
Like pattern (c), this version also relies on the overloaded operator() for the Boolean value needed to continue or end the loop. The pattern requires two reads because the eofbit is not set until a read operation detects the end of the file.
A while-loop based on the gcount function. This version requires two read operations. The first read, which takes place outside and above the loop, attempts to read the first character from the file; if a character was read, gcount returns 1 and the loop runs. The read inside and at the bottom of the loop attempts to read the next character; while data remains in the file, gcount returns a value greater than 0 and the loop continues.

Binary Data

The code fragment presented in Figure 2(b) raises the last question about character I/O that we must answer. The get function reads and returns a single character or byte from a file. In the code fragment, the byte is first stored in a variable and then compared to the value represented by the symbolic constant EOF, typically -1. This code pattern was one of two used in the mycopy example to read the contents of a binary file. If a binary file can contain any value, positive or negative, why doesn't the loop end when the program reads a data value of -1 from the file?

The answer lies in how computers store negative values in main memory and disk files. Each byte of data consists of 8-bits, which means there are 2⁸ = 256 possible patterns of 1s and 0s. Two keywords, signed (able to store 0, positive, and negative values) and unsigned (only able to store 0 and positive values) modify how the bit-patterns are interpreted. C++ treats most integers as signed by default, but it is left to the underlying hardware to determine if characters are signed or unsigned by default (typically signed on contemporary hardware).

Since we can't store a "-" sign in a byte, some bit patterns must be interpreted as negative values. The left-most bit, called the sign bit, indicates the sign of a signed value (0 = positive and 1 = negative), but it is included with the magnitude of an unsigned value. The key to understanding the behavior of the get function is appreciating that digital computers store all machine instructions and data as 1s and 0s, which are meaningless until they are grouped together and interpreted. In the case of data, the same bit pattern may be interpreted in more than one way. That is, data can be written as a signed value but read as an unsigned value.

Binary	Hexadecimal	Decimal ≥ 0	Decimal < 0
00000000	0	0
00000001	1	1
00000010	2	2
00000100	4	4
00001000	8	8
00001111	F	15
00010000	10	16
00011111	1F	31
00100000	20	32
00110000	30	48
00111111	3F	63
01000000	40	64
01010000	50	80
01100000	60	96
01110000	70	112
01111111	7F	127
10000000	80	128	-128
10001111	8F	143	-113
10010000	90	144	-112
10100000	A0	160	-96
10110000	B0	176	-80
11000000	C0	192	-64
11010000	D0	208	-48
11100000	E0	224	-32
11110000	F0	240	-16
11111111	FF	255	-1

Reading and writing selected bit-patterns. It's neither feasible nor useful to list all 256 possible bit-patterns here. So, a few patterns are selected to highlight how numbers are represented in a digital computer. (Please note that the computer's natural endianness also affects how bit-patterns are viewed and interpreted.) When the get function reads a byte with the sign bit set (highlighted in red), it returns the value as an unsigned integer ≥ 128. This behavior prevents a value of -1 stored in the file from being inadvertently interpreted as EOF.