14.9. Block I/O

Review

The block I/O functions perform simple operations that require programmers to perform many additional data processing steps. Some of the required steps are based on operations covered previously. Please review the following as needed:

There are two ways we might think block I/O is simple. First, it is simple in the sense that the two main block I/O functions require only two parameters: an address and the number of bytes to transfer. The address is where the read function stores the input data and from where the write function takes the output data. Block I/O is also simple in the sense that neither function formats nor modifies the data in any way - the functions simply move data, as a stream of bytes, between the program and a file. Although the first argument for both functions is a character pointer, char*, we can use the functions to read and write many kinds of data. The type char is used here as a synonym for "byte."

The main block I/O functions. The variable buffer is the address of a block of memory. We can create the block as an array, as a single object (i.e., an instance of a class or a structure), or as simple multi-byte data (e.g., an int, which is typically four or eight bytes). streamsize is a pseudo data type that is translated into an unsigned integer whose size is appropriate for a given system. Repeating the description of gcount from the Character I/O section, the "g" in the function name is short for "get," which is used as a synonym for read or input. The function returns a count of how many bytes or characters were read with the last read function call. For more detail, please see:
Three images depicting how data blocks are stored in a file. In the top or (a) picture, the blocks are all a uniform size. In the second or (b) picture, alternate blcoks have a different size. And in the final or (c) picture, the data blocks can have any size, but the data block size is stored in the file before the data.
Reading and writing blocks of data. The very simplicity of block I/O operations makes them very flexible, albeit at the expense of additional effort on the part of the programmer. Each time that a read or write function is called, the programmer must have some way of knowing how many bytes to read or write. There are three common ways of solving this problem:
  1. In the most simple (and the most common) case, all of the blocks are the same size. Each square may represent:
    1. a single byte or character
    2. a single integer or other simple type
    3. an object
  2. It is possible to intermix blocks of different sizes as long as two conditions are met:
    1. each kind of block is the same size - e.g., all the blue blocks are the same size and all the gold blocks are the same size
    2. the different kinds of blocks are stored in a predictable, repeating pattern - the pattern allows programmers to know how many bytes to read on each input operation
  3. Blocks of varying sizes are possible but the programmer must store the size of each block before before storing the data. The small purple rectangles represent the size of the following data block painted green

Although block I/O is flexible, powerful, and the mainstay of some kinds of computing, there is a critical aspect we keep in mind: Data processed with block I/O functions are typically in a binary format and are often not portable between computers based on different hardware or running other operating systems. Even on the same computer, different compiler settings can make data written by one program unreadable by another. For example, the Visual Studio compiler generates 32-bit integers by default, but can we reconfigure it to create 64-bit integers. The differences between how various computers represent binary data often make it necessary to programmatically "massage" data when transferring it between systems. Alternatively, we can export the data to a more portable format - for example, the comma-separated values or CSV format used by many spreadsheet and database programs - and then import it on the second system. Fortunately, moving binary data between incompatible systems happens infrequently.

Common Programming Patterns

Although the first argument of both the read and write functions is a character pointer, the block I/O functions treat the data as essentially typeless. When the block I/O functions read or write data, they treat it as just a stream or sequence of bytes - the kind of data that the block represents is not significant. So, with typecasting, the block I/O functions can handle any data.

  Data Block I/O
(a)
ifstream in(in_name, ios::binary);
ofstream out(out_name, ios::binary);
 
(b)
char	block[512];
int	count;
in.read(block, 512);
while ((count = in.gcount()) > 0)
{
	out.write(block, count);
	in.read(block, 512);
}
(c)
int counter;
out.write((char *) & counter, sizeof(int));

in.read((char *) & counter, sizeof(int));
(d)
int array[100];
for (int i = 0; i < 100; i++)
	out.write((char *) & array[i], sizeof(int));
out.write((char *)array, 100 * sizeof(int));

for (int i = 0; i < 100; i++)
	in.read((char *) & array[i], sizeof(int));
in.read((char *)array, 100 * sizeof(int));
(e)
struct foo { . . . };
foo my_foo;
out.write((char *) & my_foo, sizeof(foo));

in.read((char *) & my_foo, sizeof(foo));
(f)
struct foo { . . . };
foo my_foo[100];
for (int i = 0; i < 100; i++)
	out.write((char *) & my_foo[i], sizeof(foo));
out.write((char *) my_foo, 100 * sizeof(foo));

for (int i = 0; i < 100; i++)
	in.read((char *) & my_foo[i], sizeof(foo));
in.read((char *) my_foo, 100 * sizeof(foo));
Using various kinds of data with block I/O operations. By using a combination of the address of operator, typecasting and the sizeof operator, the block I/O functions can support virtually any kind of data.
  1. Defining the file stream objects appearing in the examples. Replace the first constructor parameter with an appropriate file name or a variable - either a string or a C-string.
  2. The block I/O version of the mycopy example repeated to demonstrate a common pattern. The second parameter is the number of bytes to transfer between the array and the file. The read function requests 512 bytes but will read less if there are fewer than 512 remaining in the file. The gcount function returns the number of bytes read by the last read operation. This code fragment copies a file, so, to preserve the correct file size, the write function writes the actual number of bytes read.
  3. The two complementary I/O operations illustrated here typically appear in different parts of a program. This example illustrates how to read or write an integer as a contiguous block of bytes. The expression comprising the first argument gets the address of the integer, then casts it to a character pointer to match the function's argument list. The second argument gets the size of an integer, which is the number of bytes to read or write. We can replace int with double, float, etc. for other kinds of fundamental or primitive data types.
  4. Programmers have two options when reading and writing arrays: First, we may use a for-loop to access each element of the array and read or write the array one element at a time. Alternatively, we can calculate the total number of bytes to transfer. The total number is the product of the number of array elements and the size, measured in bytes, of each array element. The block I/O functions can transfer the entire array as a single block because the array elements form a contiguous block in memory.
  5. We can also read and write objects (instances of structures or classes). But there is one potential problem. If an object contains a pointer field, the I/O operations read or write the address in the pointer - not the data to which the pointer points. Programming a solution for this problem is possible but challenging.
  6. It's also possible to read and write arrays of objects based on the same reasoning as (d).