It is often useful to process (read and write) the contents of a file one character or one byte at a time. The three I/O functions demonstrated in the mycopy.cpp example are sufficient for completing these tasks. Furthermore, these functions work with both textual and binary data. Four additional "support" functions, also illustrated in mycopy, provide everything else needed for the task. Three of the functions, eof, good, and operator(), were described in detail previously. While operator>> and operator<< can be used to read and write textual data, they should be avoided when processing binary data because their behavior varies between systems.
Function | Description | Stream |
---|---|---|
int get(); |
Reads one character from the input stream and returns it as an integer. C++ converts between integers and characters without requiring a type cast. | ifstream |
istream& get(char& c); |
Reads one character from the input stream, which is passed back through the argument by reference. | ifstream |
ostream& put(char c); |
Writes one character or byte to the output stream. | ofstream |
streamsize gcount(); |
The "g" in the function name is short for "get," which is used as a synonym for read or input. The function returns a count of how many unformatted characters were read with the last input operation. Unformatted characters include the full line-separator sequences (for operating systems that use multi-character line separators). While this function can be used with character input, it is most useful with block reads. The streamsize return type is a portable type appropriate for storing a size value. |
ifstream |
ifstream input("input.txt"); ofstream output("output.txt); |
|
(a) | |
One-read Loops | |
---|---|
int c; while ((c = input.get()) != EOF) { // process c output.put(c); } |
char c; while (input.get(c)) { // process c output.put(c); } |
(b) | (c) |
Two-read Loops | |
int c = input.get(); do { // process the data c = input.get(); } while (! input.eof()); |
int c = input.get(); do { // process the data c = input.get(); } while (input); |
(d) | (e) |
int c = input.get(); while (input.gcount() > 0) { // process c output.put(c); c = input.get(); } |
|
(f) |
The code fragment presented in Figure 2(b) raises the last question about character I/O that we must answer. The get function reads and returns a single character or byte from a file. In the code fragment, the byte is first stored in a variable and then compared to the value represented by the symbolic constant EOF, typically -1. This code pattern was one of two used in the mycopy example to read the contents of a binary file. If a binary file can contain any value, positive or negative, why doesn't the loop end when the program reads a data value of -1 from the file?
The answer lies in how computers store negative values in main memory and disk files. Each byte of data consists of 8-bits, which means there are 28 = 256 possible patterns of 1s and 0s. Two keywords, signed
(able to store 0, positive, and negative values) and unsigned
(only able to store 0 and positive values) modify how the bit-patterns are interpreted. C++ treats most integers as signed by default, but it is left to the underlying hardware to determine if characters are signed or unsigned by default (typically signed on contemporary hardware).
Since we can't store a "-" sign in a byte, some bit patterns must be interpreted as negative values. The left-most bit, called the sign bit, indicates the sign of a signed value (0 = positive and 1 = negative), but it is included with the magnitude of an unsigned value. The key to understanding the behavior of the get function is appreciating that digital computers store all machine instructions and data as 1s and 0s, which are meaningless until they are grouped together and interpreted. In the case of data, the same bit pattern may be interpreted in more than one way. That is, data can be written as a signed value but read as an unsigned value.
Binary | Hexadecimal | Decimal ≥ 0 | Decimal < 0 |
---|---|---|---|
00000000 | 0 | 0 | |
00000001 | 1 | 1 | |
00000010 | 2 | 2 | |
00000100 | 4 | 4 | |
00001000 | 8 | 8 | |
00001111 | F | 15 | |
00010000 | 10 | 16 | |
00011111 | 1F | 31 | |
00100000 | 20 | 32 | |
00110000 | 30 | 48 | |
00111111 | 3F | 63 | |
01000000 | 40 | 64 | |
01010000 | 50 | 80 | |
01100000 | 60 | 96 | |
01110000 | 70 | 112 | |
01111111 | 7F | 127 | |
10000000 | 80 | 128 | -128 |
10001111 | 8F | 143 | -113 |
10010000 | 90 | 144 | -112 |
10100000 | A0 | 160 | -96 |
10110000 | B0 | 176 | -80 |
11000000 | C0 | 192 | -64 |
11010000 | D0 | 208 | -48 |
11100000 | E0 | 224 | -32 |
11110000 | F0 | 240 | -16 |
11111111 | FF | 255 | -1 |