14.7. Block I/O

Review

In one sense, block I/O operations are simple, requiring only an address in primary or main memory and the number of bytes to transfer. However, the block read and write functions often rely on auxiliary operations or concepts, leading to potentially confusing syntax. The address is where the read function stores the input data and where the write function takes the output data. Furthermore, the block I/O operations are simple in the sense that they don't modify or interpret the data in any way - they only move it as a stream of bytes between the program and a file. Although the first parameter is a character pointer, the functions can operate on all data with appropriate typecasts.

  • ostream& write(char* buffer, streamsize nbytes);
  • istream& read(char* buffer, streamsize nbytes);
  • streamsize gcount();
A picture of an array named 'buffer' with 'nbytes' number of elements.
The primary block I/O functions. In this example, the variable buffer is the name of an array and, therefore, the address of a block of memory nbytes long. Programs typically create the buffer as a single variable, an array, or an object - an instance of a structure or class. streamsize is a type alias whose type is appropriate for representing a size on a given system. The gcount function returns (i.e., "gets" the "count" of) the number of bytes read by the last input operation. For more details, please see:
Three images depicting how computers store data blocks in a file. In the top or (a) picture, the blocks are all a uniform size. Alternate blocks are different sizes in the second or (b) picture. And in the final or (c) picture, the data blocks can have any size, but the file stores the data block's size before the data.
Reading and writing blocks of data. The simplicity of block I/O operations makes them very flexible, albeit at the expense of additional effort on the programmer's part. Each time a program calls a read or write function, it must have some way of "knowing" how many bytes to read or write. There are three common ways of solving this problem:
  1. In the most simple and common case, all the blocks are the same size. Each square in the picture may represent:
    1. a single byte or character
    2. a single integer or other simple type
    3. an object
  2. Programs can intermix blocks of different sizes by meeting two conditions:
    1. Each kind of block is the same size - e.g., all the blue blocks are the same size, and all the gold blocks are the same size.
    2. The different kinds of blocks are stored in a predictable, repeating pattern, allowing programmers to know how many bytes to read on each input operation.
  3. Programs can read and write blocks of varying sizes, but they must precede each block in the file with its size. The small purple rectangles represent the size of the following data block painted green. The program "knows" the type of the small blocks in advance, so they "know" how many bytes to read. The retrieved size data configures the next read function call.

Block I/O is flexible, powerful, and a mainstay of database computing. However, programs typically perform block I/O in binary mode, making moving the data between dissimilar systems challenging. Even on the same computer, different compiler settings can make data written by one program difficult to read by another. The differences between various data representations often make it necessary to programmatically "massage" data when transferring it between systems. Alternatively, programs can export the data to more portable formats, such as the comma-separated values or CSV format used by many spreadsheet and database programs, and import it on another system. Fortunately, moving binary data between incompatible systems happens infrequently.

Common Programming Patterns

Although the read and write functions' first parameter is a character pointer, they treat the data as typeless. When they read or write data, they process it as a stream or sequence of bytes, making its type irrelevant. So, with a combination of the address of operator, typecasting, and the sizeof operator, the block I/O functions can support virtually any kind of data.

ifstream in(in_name, ios::binary);
ofstream out(out_name, ios::binary);
Block I/O streams. Programs typically perform block I/O on files opened in binary mode. The following examples use these stream objects.
int counter;
out.write((char *) &counter, sizeof(int));
in.read((char *) & counter, sizeof(int));
while (in.read((char *) &counter, sizeof(int)))
    ...
struct foo { . . . };
foo my_foo;
out.write((char *) &my_foo, sizeof(foo));
in.read((char *) &my_foo, sizeof(foo));
while (in.read((char *) &my_foo, sizeof(foo)))
    ...
Single element block I/O. Although the data types vary, the argument pattern within each function call is the same. The program gets the data's memory address with the address-of operator and casts it to a character pointer to match the functions' signature, forming the first argument. The second argument is the number of bytes to read or write, found with the sizeof operator operating on the data's type. The first-row example illustrates how to read or write an integer but works the same for double, float, or any fundamental type. Similarly, programmers can replace struct in the second row with class. However, while possible, reading and writing objects with pointer fields is quite challenging.

Programs typically control output or write operations based on external events, for example, a user choosing to end data input. Conversely, input or read operations typically loop, alternately reading and processing data until reaching the end of the file. The read function in the third column returns an input stream, and operator bool converts it to a Boolean value denoting the file's state, driving the loop.

int array[100];
for (int i = 0; i < 100; i++)
    out.write((char *) &array[i], sizeof(int));
int i = 0;
while (in.read((char *) &array[i], sizeof(int)))
    i++;	// and additional processing
struct foo { . . . };
foo my_foo[100];
for (int i = 0; i < 100; i++)
    out.write((char *) &my_foo[i], sizeof(foo));
int i = 0;
while (in.read((char *) &my_foo[i], sizeof(foo)))
    i++;	// and additional processing
Arrays and Sequential block I/O. Occasionally, programs write the data stored in an array to a file with sequential write operations or fill an array with data stored in a file. In both cases, programs treat each array element as an individual variable. Once the program selects an element with the index operator, the remaining syntax is identical to the previous figure. Therefore, programmers can replace int with any fundamental type and, with the stated caveat, struct with class. For-loops work well if the program "knows" the number of elements to read a priori, but, typically, the number is unknown, and programs use a while-loop. Incrementing the array index as part of the indexing operation produces compact but incorrect code:
while (in.read((char *) &my_foo[i++], sizeof(foo)))
Programs often use the index variable in two ways. First, as the term suggests, as an array index stepping through an array. However, subsequent operations often use it as a count of the number of objects read. b-rolodex2.cpp in the following section demonstrates a program doing this. The read function detects and reports reaching the end of a file by setting a stream's state flags on a failed read operation. Incrementing the index inside the read operation counts the failed read, resulting in an incorrect count.
int array[100];
out.write((char *)array, 100 * sizeof(int));
in.read((char *)array, 100 * sizeof(int));
struct foo { . . . };
foo my_foo[100];
out.write((char *) my_foo, 100 * sizeof(foo));
in.read((char *) my_foo, 100 * sizeof(foo));
En bloc I/O. Programs can read or write an entire array en bloc with a single operation because the array elements form a contiguous block in primary or main memory. The program calculates the total number of bytes to transfer as the product of the total number of array elements and the size, measured in bytes, of each element.