Although the three concepts presented in this section are independent of one another, they each play a role in how a file opened and accessed with a stream object behaves. In that sense, these prerequisite concepts lay a foundation upon which the remainder of the chapter depends.
From the formal definition of a file presented at the beginning of the chapter, we also know that every file has a name and that we can access the file's contents by its name. But there are many ways to express the name of a file. Modern general-purpose operating systems organize files and directories (also known as folders) into a hierarchy or tree. Unlike arborists, computer scientists organize their trees with the root at the top and the leaves at the bottom. Any directory in the tree may contain many subdirectories or files, but every name in a directory must be unique.
(a) | (b) |
\
, on Windows computers and a forward-slash, /
, on POSIX-compliant systems. A sub-directory may have any number of files and sub-directories.
A running program has a location or position in the file system tree, which is known as its current working directory (cwd). When the operating system runs a program, it sets its current working directory as one of the directories in the computer's file system. Most operating systems have two ways of naming or referring to a file in a program:
\Users\dab\My Music\Shilo.mp3 /home/dab/Music/Shilo.mp3 |
My Music\Shilo.mp3 Music/Shilo.mp3 |
..\dab\Shilo.mp3 ../dab/Shilo.mp3 | Shilo.mp3 Shilo.mp3 |
(a) | (b) | (c) | (d) |
\
character as the name of root and as the file separator character. The other operating systems use the slash or forward-slash /
character as root and the path separator..
represents the parent of or one level up from the current working directory.
represents the current working directoryWhenever we open a file, we must choose to use either a full or a relative pathname. If we choose to use a relative name, then the file's location is determined by the running program's current working directory when the file is accessed. See Path for an interesting history of file system pathnames and additional detail.
The C++ input/output system uses bit-vectors, unsigned 32-bit integers, to control various aspects of how stream objects behave - how output is formatted or how input is interpreted. Each bit in a bit-vector is called a flag and represents a specific behavior or formatting feature. A flag set to 1 indicates that a feature is switched on or is active, while a flag set to 0 indicates that the feature is switched off or is inactive. The I/O system also defines a pseudo data type, called fmtflags
, to represent bit-vectors and bit-masks. Bit-masks
are constant values that represent bit patterns, or, in this context, formatting flags. The individual bits or flags in a fmtflags
variable may be set (set to 1) and unset (set to 0) with I/O system functions or directly with bitwise operations on the fmtflags
variables.
Bitwise-OR | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1100 | 1001 ------ 1101 |
||||||||||||||||
(a) | (b) | |||||||||||||||
Bitwise-AND | ||||||||||||||||
1100 & 1001 ------ 1000 |
||||||||||||||||
(c) | (d) |
|
, is used to switch on some bits in a bit-vector, data in the diagram, activating I/O behaviors and formatting options.&
, is use to mask out some bits in a bit-vector, data in the diagram, to identify which bits are set.
The I/O system provides several named bit-masks to make it easier to work with the I/O flags. The constants are typically accessed through the ios
class (even though they are defined in basic_ios
). The
scope resolution operator, ::
, ties the class name ios
on the left to the name of a bit-mask or symbolic constant on the right. Four file I/O bit-masks illustrate how bit-masks are used:
Bit-Mask | Purpose | |
---|---|---|
(a) | ios::in = 0x01 = 00000000000000000000000000000001 |
Open the file for input / reading |
ios::out = 0x02 = 00000000000000000000000000000010 |
Open the file for output / writing | |
ios::app = 0x08 = 00000000000000000000000000001000 |
Append new data at the end of the file | |
ios::binary = 0x20 = 00000000000000000000000000100000 |
Threat the file contents as binary data | |
(b) | fmtflags modes = ios::in | ios::out | ios::app | ios::binary; |
Combine all of the behaviors |
(c) |
= 0x2B = 00000000000000000000000000101011 |
fstream file(file_name, modes); |
if (mode & ios::binary) { // process the file as binary } if (mode & ios::app) { // append new data at the end of the file } |
(a) | (b) |
fstream
class and opening files are described in the next section.ios::binary
had been left out of modes, then the expression mode & binary
would produce a 0 and the code for performing binary I/O would be skipped.Bit-masks are used during the construction of stream objects, with the open function, with the setiosflags
and resetiosflags
manipulators, and with the setf
and unsetf
functions. A list of bit-mask constants may be found in the I/O Summary at the end of the chapter. The complete set of bitwise operators was presented in the Chapter 3 Supplemental section.
The formal definition also explains that "Data files may be numeric, alphabetic, alphanumeric, or binary." Numeric data is just a special case of binary data, and together, alphabetic and alphanumeric data form a more general class called textual data. So, for the remainder of this chapter, we focus our attention on two broad data classifications: textual and binary. We won't deal with textual and binary data combinations, but when a combination of data occurs in practice, the program typically treats it as a binary file.
The modern use of Unicode and wide (2-byte) characters to represent textual data has somewhat blurred the distinction between text and binary files. Nevertheless, the distinction remains important, and to help us understand the difference between text and binary files, we'll restrict our discussion to the older single-byte representation of textual data based on the ASCII encoding scheme.
The ASCII encoding scheme represents characters as the lower 7-bits of an 8-bit byte or character; the highest bit is always 0. This encoding can represent 128 characters with numeric values that range from 0 to 127. The lowest 32 characters or values (0 to 31) are control characters. Two control characters play an essential role in text files:
\n
\r
These characters are used in text files to mark the end of one line of text and the beginning of the next line (the characters are called line separators or line terminators). The C and C++ programming languages originated on Unix systems. Unix uses a single line feed character, which it calls a newline, to separate the lines in a text file. The other POSIX systems (Linux and macOS) also adopt this convention. Alternatively, Windows uses a two-character sequence, \r\n
, to separate the lines in a text file. (Classic macOS, before incorporating a Unix kernel, used a single \r
as the line separator.)
It's difficult to port (move from one system to another) a program when the systems utilize different characters for the line separator. The C Programming language added the distinction between text and binary files to help solve the problem created by having different line separator characters. When a program opens a file in text mode on a Windows system, the \r\n
sequenced is mapped or converted into a single \n
character when read from a file, and the single \n
character is converted into a \r\n
sequence when written to a file. POSIX systems don't need any mapping, and text mode does not affect file I/O.
Having the system automatically convert between \r\n
and \n
greatly simplifies the task of porting programs (at the source code level) from one operating system to another. But it also introduces another problem. Some files contain numeric rather than textual data. You are undoubtedly familiar with some of these numeric files: JPG, GIF, MP3, EXE, etc. Remember that text files consist of characters that are encoded as short, 1-byte integers, specifically \n
and \r
are encoded as the values 10 and 13 respectively. Altering these values in a truly binary file will corrupt the data, resulting in an unviewable image or unplayable audio or video file. We must open binary files in binary mode to prevent the character mapping that will lead to data corruption.
ifstream in("filename", ios::binary); |
ifstream in; in.open("filename", ios::binary); |
(a) | (b) |
It was suggested at the end of the last section that there are three common ways of accessing a file's contents. It's now appropriate to revisit these three ways in the context of text versus binary files: