5.8.2 Unions And Bit-Fields

Unions and bit-fields are minor extensions to the structure syntax. Like structures, unions and bit-fields have named fields, but they treat them differently. Bit-fields allow programmers to specify each field's size measured in bits. Unions can only save one field at a time, allocating only enough space to save the largest and "wasting" space when saving the smaller ones. Programs can use bit-fields and unions independently or jointly to perform some data conversions. However, bit-fields and any data conversion based on them are system-dependent and not portable across operating systems, hardware, or both.

Bit-Fields

General-purpose processors or CPUs - the ones running the familiar desktop operating systems - only support byte-addressable memory. Consequently, a byte is the smallest memory unit they can directly access, and larger data types are always aligned on byte boundaries. Bit-fields are one way programmers can work around these limitations by specifying fields with arbitrary bit lengths. This mechanism still doesn't qualify as direct access because additional access operations occur outside the programmer's view. The ending bits remain unused if the last bit-field doesn't end on a byte boundary. Finally, the fields are typically unsigned because programs are generally concerned with only the individual bits.

// bit-field used to unpack the st_mode in field in a _stat structure
struct modes
{
	unsigned others : 3;	// permissions for everyone else
	unsigned group  : 3;	// group permissions
	unsigned user   : 3;	// file owner permissions
	unsigned type   : 7;	// file type: directory, regular file, pipe, device, etc.
};

A bit-field representing Unix file permissions. A previous example demonstrated some of the bitwise operators (shifting and masking) with the file access permissions used by old versions of the Unix and Linux operating systems. While the scheme saved the access permissions as a 16-bit short int, it partitioned the bits into units smaller than a byte. We can represent those units as bit-fields. unsigned is a shorthand notation for unsigned int. Some systems may implement each field as an integer, ignoring the unused bits and speeding memory and disk access. Other systems may implement the fields as a single integer, saving memory and disk space.

More about bit-fields after a discussion of unions.

Unions

The syntax specifying a union is similar to the syntax specifying a structure, only replacing struct with union. However, that small change in syntax results in a large change in the behavior of the instantiated objects.

The picture illustrates a structure named demo with three fields: a char, an int, and a double. A demo object allocates enough memory to save all three fields simultaneously. — **Structure vs. union**. Structure and union objects in memory illustrate the difference in memory allocation between the two.

A structure allocates enough memory to contain all the fields simultaneously. It's like a basket that can hold many items and allows the program to handle them as a group.

In contrast, a union typically specifies two or more fields of different types but only allocates enough memory to contain the largest. All fields share this memory, implying that a union can only hold one field value at a time.

The picture illustrates a union named demo with three fields: a char, an int, and a double. A demo object allocates only enough memory to save the largest field - the double. — **Structure vs. union**. Structure and union objects in memory illustrate the difference in memory allocation between the two.

A structure allocates enough memory to contain all the fields simultaneously. It's like a basket that can hold many items and allows the program to handle them as a group.

In contrast, a union typically specifies two or more fields of different types but only allocates enough memory to contain the largest. All fields share this memory, implying that a union can only hold one field value at a time.

Although newer ANSI standards have increased the number of automatic type promotions C++ can perform, it's still described as a strongly typed language, meaning that it often can't assign or pass one data type to a variable of a different type. Although rare, occasionally, we need a variable that can violate or circumvent the strong typing rule by holding data of different types at different times. Unions solve this problem when typecasting can't convert between some of the types, or when a cast might cause a truncation error, or when there isn't a conversion function between some of the types.

union number { char c; int i; double d; };	number my_number; my_number.c = 'A'; ... my_number.i = 42; ... my_number.d = 3.14259; ...
(a)	(b)

A union syntax example.

A program can save either a char, int, or double in a number object, but not simultaneously. So, it saves a value in one field, uses that value, and then repeats the process. Regardless of which field the "save" operation uses, it overwrites the previously saved data.
Like structures, unions form a new type specifier. A program can instantiate or create a number object - named my_number in the current example - and access a specific field with the dot operator. The ellipses represent statements, omitted for simplicity, using the accessed data.

Packing and Unpacking Data With Unions and Bit-Fields

By joining bit-fields and unions, programmers can efficiently pack and unpack encoded data. The combination circumvents C++'s strong typing rules and allows the program to access bit groups smaller than eight bits. Two concepts are crucial for understanding the packing and unpacking process. First, the compiler uses the data's type to determine how to interpret its bits. Second, although a union may have many fields, they all occupy the location in memory.

A union with two fields. The first is a 16-bit integer, and the second is a bit-field. The bit-field has four fields, one 7 bits long and three 3 bits long. The integer and bit-field occupy the same memory location. A program can save data in the integer and extract it through the bit-field or vice versa. — **Data packing and unpacking example**. The bit-field `modes` and the union `map` are the key structures performing the packing and unpacking operations. The example demonstrates how programmers use these structures to unpack or decode compressed data.

The `modes` bit-field has a total of 16 bits of data, the same number as a `short` integer.

The union with two fields. The first is a 16-bit `short int`, the second is an instance of the `modes` bit-field.

To unpack or decode data, the program

Instantiates a `map` object named `mapper`.

Saves the packed or encoded data in the object's `statmode` field.

Extracts the unpacked or decoded data from the bit-field's fields. The program evaluates the dot operator from left to right. The first element, `mapper`, is the union object's name. The second or middle element, `convert`, names the union's second field, which is an instance of the `modes` bit-field. The last element names a specific bit-field in the bit-field structure.

A union with one 16-bit integer field and a bit-field. The bit-field partitions the 16 bits into four fields, one 7-bit field and three 3-bit fields. Although the illustration depicts them adjacent to one another, *they occupy the same memory space.*

To encode data, reverse the process: store the unpacked data into each field one at a time and then extract the packed or encoded data from the 16-bit integer. This technique is not intuitive and is non-portable, but it is fast and efficient.

Please see stat.cpp for the complete program.

struct modes { unsigned others : 3; unsigned group : 3; unsigned user : 3; unsigned type : 7; };	union map { unsigned short statmode; modes convert; };
(a. bit-field)	(b. union)
map mapper; // i mapper.statmode = data; // ii cout << mapper.convert.user << endl; // iii cout << mapper.convert.group << endl; cout << mapper.convert.others << endl;
(c)	(d)