14.8. Random and Direct Access

Time: 00:03:39 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides: PDF, PPTX
Review
A picture depicting a file as a contiguous sequence of blocks. The picture numbers blocks from 0 to n-1, where n is the total number of blocks in the file. 'fstream' objects maintain two file position pointers, one each for input and output.
A file consisting of n blocks. Whereas ifstream and ofstream objects maintain a single file position pointer, fstream objects maintain two, one each for input and output.

Random and direct access are synonyms, reflecting different aspects of the same I/O operations. As applied to file I/O, "random" doesn't imply "unpredictable," instead, it means "in any order." "Direct" suggests that a program can "directly" read or write any block of data without moving through the file sequentially. Programs build random or direct operations on top of the block I/O functions.

It's convenient to visualize a file as a contiguous sequence of blocks where each block represents a unit of data, often an instance of a structure or class, and numbers or indexes them sequentially. The block numbers are logical features of a problem that the program must map to the physical addresses the hardware "understands." The physical addresses are offsets, measured in bytes, from a specified file location:

offset = block number × block size

The UML class diagram of the C++ I/O stream classes presented at the beginning of the chapter shows that the ios class has a buffer named filebuf. All the leaf subclasses, ifstream, ofstream, and fstream inherit the buffer. However, iostream, and therefore fstream, has two superclasses and inherits a buffer from both. Consequently, an fstream object has two buffers and two position pointers.

Previous examples relied on streams that could read or write data, but not both. fstream objects can perform both operations, making it necessary for a program to specify carefully which position pointer it uses. Programs distinguish the two pointers as "get" (read or input) and "put" (write or output). In preparation for accessing a specific data item, the program moves the appropriate pointer within a file with one of four overloaded "seek" functions. The next I/O operation occurs at the new position in the file. Additionally, two "tell" functions report a stream's current position within a file. The name of each function ends with either a "p" or "g," denoting one of the pointers.

Function Description
istream& seekg(streampos pos);
Moves the "get" or "put" stream position to an absolute location, pos, in the file. Absolute positions are offset in bytes from the beginning of the file; pos must be ≥ 0.
ostream& seekp(streampos pos);
istream& seekg(streampos off, seekdir loc);
Moves the "get" or "put" stream position to a location relative to one of three file locations. The offset, off, is the number of bytes added to or subtracted from the specified location, loc, is denoted by one of the symbolic constants:
  • ios::beg beginning of the file; off must be ≥ 0
  • ios::cur current position in the file
  • ios::end end of the file
Programs cannot seek past the beginning of the file, so
off = current position + offset ≥ 0.
It's possible but rarely useful to seek beyond the file's end, so
off = current position + offset ≤ file size.
ostream& seekp(streampos off, seekdir loc);
streampos tellg();
Returns the current position of one of the position pointers within the file. The returned position is measured in bytes from the beginning of the file. Both functions return -1 on failure.
streampos tellp();
The "seek" and "tell" functions. The file positioning and locating functions allow programs to access data in any desired order, not just sequentially. As the order isn't predetermined, programmers often describe it as "random." A program can call the read or write functions repeatedly, resulting in sequential access. However, switching between reading and writing requires an intervening "seek" function call: streampos and seekdir are type aliases created in the stream classes. The first is an integral type appropriate for representing positions within long files, while the second denotes simple flags (perhaps implemented with an enumeration or typedef).
Function Description
ostream& flush()
Forces the output stream to write any pending data in its buffer to the associated file.
void clear()
Clears or resets to 0 eofbit and all error flags.
bool is_open() const;
Returns true if the file associated with the stream is open; returns false otherwise.
File I/O "housekeeping" functions. Although these functions don't appear in the following discussion or examples, the text includes them here for easy reference. Furthermore, they play significant roles in the programming examples presented in the following section. Please see flush, clear, and is_open for more information.

Direct Access

Programs can group bytes together forming blocks or records and read and write block-oriented files sequentially. They implement direct access by combining block I/O with the file positioning functions to read and write records. Programs specify the "seek" functions' offset value in bytes. However, a file of records appears and behaves like an array of objects, so it's more convenient for programmers to think of positioning operations in the problem-oriented terms of block numbers or indexes. Programmers must map a record, block number, or index to a byte offset within the file. Unfortunately, C++ doesn't have an appropriate library function, forcing programmers to implement the operation themselves.

absolute address = record number × size of a record
record number = absolute address / size of a record

(a)
struct chunk { . . . };

streampos offset = record * sizeof(chunk);
int record = offset / sizeof(chunk);
(b)
offset = 7 * 10 = 70
record = 70 / 10 = 7
(c)
A picture demonstrating the relationship between byte addresses and record numbers. Assuming that each record is ten bytes long, record number 5 is located at byte address 50.
(d)
The relation between byte addresses and record numbers. Byte addresses and record numbers begin with zero and continue through the end of the file. The last valid byte address is the file's size, while the last valid record number is the number of records minus one. That is, both addressing schemes are zero-indexed. A simple mapping function converts a record or block number into a byte address that the positioning functions can use. A similar mapping function converts from a byte address to a record number.
  1. Pseudo code for the mapping operations. A record's absolute address (its byte offset from the beginning of the file) is the product of the record number and the record's size.
  2. C++ code for the mapping functions. Replace chunk with a specific structure or class from the problem. The calculation of the record number relies on integer division.
  3. A simple example of the mapping functions assuming, for simplicity, that the size of a chunk is 10 bytes.
  4. An abstract representation of the example. Each rectangle is a record - an instance of the chunk structure. For ease of illustration, we assume that the size of each record is 10 bytes. The bottom rectangle or last record is located at offset 70 in the file, spanning bytes 70 through 79, and having record number 7.
One of the hallmarks of direct access is that a program can seek to a specific record, read it, modify it, seek to the same record number again, and write the modified record back into the file without affecting any other record.

Common Programming Patterns

struct chunk { . . . };
chunk c;
int n;		// a block number / index
fstream data;
Specification and definitions for the examples.
data.seekp(0, ios::end);
data.write((char *) &c, sizeof(chunk));
Append a new record at the end of the file.
data.seekg(0);
data.read((char *) &c, sizeof(chunk));
Rewind to the beginning of the file and read the first record.
data.seekg(0);
while (data.read((char *) &c, sizeof(chunk)))
	....
Rewind to the beginning of the file and read all records in sequence.
// read a record (perhaps in a loop)
data.seekp(data.tellg() - (streampos)sizeof(chunk));
//data.seekp(ios::cur, -(streampos)sizeof(chunk));  // alt
data.write((char *) &c, sizeof(chunk));
Searching and replacing - seek to the position after the last read, backup one record, and overwrite the record that's already there. The illustrated typecasts are required.
data.seekg(n * sizeof(chunk));
data.read((char *) &c, sizeof(chunk));
// modify c
data.seekp(n * sizeof(chunk));
data.write((char *) &c, sizeof(chunk));
Updating a record or block. Seek (move) to record number n. Read and modify the record. Seek to the same number and write the modified record, overwriting the old version.
data.seekp(0, ios::end);		  // (a)
streampos pos = data.tellp();
data.write((char *) &c, sizeof(chunk));
		. . .
data.seekg(pos);			  // (b)
data.read((char *) &c, sizeof(chunk));
		. . .
data.seekp(pos);			  // (c)
data.write((char *) &c, sizeof(chunk));
Simple database operations. Groups of code are separated by time. The variable pos may be a field in another object.
  1. Move to the end of the data file, save the position, and write a new record
  2. Move to the saved position and read the stored record
  3. Move to the saved position and replace (overwrite) the existing record
Implementing direct access with positioning functions. The examples illustrate how programs implement various direct or random access operations by moving a stream's position pointers in a file and executing block I/O operations. They emphasize the syntax and sequence of the positioning and I/O function calls and omit the obligatory validation and safety tests. The examples in the following sections present direct access in a more authentic context.