7.9. Index Order

Time: 00:05:17 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)

Previously, I stated that when defining a two-dimensional array, the first size is the number of rows, and the second is the number of columns. I'm deliberately using the labels "rows" and "columns" because their meanings are well-established and understood in the context of tables. But does choosing one array index order over the other make a difference? Are we forced to use rows × cols, or can we define it as cols × rows? Again, I'm deliberately using a multiplication notation to make a point: multiplication is commutative, so the product rows × cols is the same as cols × rows, suggesting there is some ambiguity, or, depending on your point of view, some flexibility with the index order.

The amount of memory allocated to store a two-dimensional array is independent of index order. The picture illustrates six array elements implemented as a linear sequence of squares: 2*3 = 3*2 = 6. — **Array definitions and memory allocation**. The system allocates the same amount of memory regardless of index order, e.g., `2×3 = 3×2 = 6`. Programs access computer memory with a single address; equivalently, computer memory has a linear address space. The compiler generates machine code mapping the two-dimensional array references in a program to the one-dimensional address needed to access the element in computer memory.

Array Definitions	Memory Allocation
int array[2][3]; int array[3][2];

Choosing Rows × Columns

Although index order doesn't affect the amount of memory allocated for a two-dimensional array, there are numerous compelling reasons for insisting that the first dimension represents rows and the second columns. So, throughout the text, I maintain that the "correct" index order is rows × cols or array[rows][cols], and attempt to justify my stand in the following sections.

Tradition: Mathematics and Early Programming Languages

Numerous mathematical operations use vectors and matrices implemented in programs as one- and two-dimensional arrays. Many engineering and scientific disciplines rely on mathematics, including matrix operations.

Mathematical Matrices

Early Programming Language Matrices

$$ A = \left[ \begin{matrix} a_{0,0} & a_{0,1} \\ a_{1,0} & a_{1,1} \\ a_{3,0} & a_{3,1} \end{matrix} \right] $$

FORTRAN	`real A(3,2)`
ALGOL	REAL A[0:2,0:1]

Mathematical matrices. Mathematics defines the elements of matrix A as a_i,j. The first or i-th subscript remains constant along each row, and the second or j-th subscript remains constant vertically down a column. Therefore, i and j denote the element's row and column respectively.

IBM created FORTRAN (FORmula TRANslation), the first widely used high-level programming language, in the 1950s for performing mathematical, scientific, and engineering calculations. The illustrated FORTRAN matrix or array is 3 rows by 2 columns. ALGOL, another early programming language and the predecessor of many modern languages like C, C++, C#, Objective-C, Pascal, Ada, etc., also adopted the rows-by-columns order. C++ continues this practice (see, for example, Multidimensional arrays).

Tradition may seem like a poor reason for adopting an index order, but I maintain that it is the best and primary reason. Many algorithms, especially for graphics, modeling, and parallel processes, are expressed in mathematical matrix notation. Programmers frequently translate those algorithms into program functions. In my experience, maintaining a consistent notation eases programming, testing, debugging, and documenting. Furthermore, it enhances readability and increases understanding.

Initializer List Order

Two array operations suggest that rows-by-columns is the most "natural" array index order. The following figure demonstrates the first, initialization lists, extended to two dimensions. The text covers the second operation, row-major ordering, extensively later in the chapter.

(a)	#include <iostream> #include <iomanip> using namespace std; int main() { char array[3][2] = { 'A', 'B', 'C', 'D', 'E', 'F' }; //char array[][2] = { 'A', 'B', 'C', 'D', 'E', 'F' }; for (int i = 0; i < 3; i++) { for (int j = 0; j < 2; j++) cout << setw(2) << array[i][j]; cout << endl; } return 0; }	A B C D E F
(b)	char array[2][3] = { ... }; for (int i = 0; i < 3; i++) for (int j = 0; j < 2; j++)	A B D E ä
(c)	char array[3][2] = { ... }; for (int i = 0; i < 2; i++) for (int j = 0; j < 3; j++)	A B C C D E
(d)	char array[3][2] = { ... }; for (int i = 0; i < 2; i++) for (int j = 0; j < 3; j++)	A B C D E F

Two-dimensional initializer list order. Initializer lists are a form of static initialization - programmers establish the values at compile-time. The list values are saved in the array when the computer allocates its memory.

This program serves as the base for all the examples. It demonstrates the "natural" initializer list storage order. The list fills the array beginning at the top left position, across the rows, from top to bottom. C++ allows programmers to omit the size of the first array dimension, which doesn't affect the program's behavior. The nested for-loops demonstrate that the program prints two-dimensional arrays to the console by rows, from the top left to the bottom right, because moving the cursor backward or upwards is difficult.
This version switches the base program's index sizes in the array's definition but leaves the for-loops unchanged. When run at different times, the program printed various (incorrect) results - the hallmark of a memory error. It may produce the correct output sometimes or on some platforms. However, correct programs must run correctly all the time and on all platforms.
The third version switches the "2" and "3" in the base program's for-loops, printing one element twice and failing to print another.
The final version switches the "2" and "3" in the base program's array definition and its for-loops. This version runs correctly but with a different array.

Extracting Rows: One Index Access

C++ implicitly implements two-dimensional arrays as one-dimensional arrays of one-dimensional arrays. This organization allows programs to extract and use individual rows, a one-dimensional array, from a two-dimensional array but not individual columns.

#include <iostream> #include <iomanip> using namespace std; void print_row(char* row, int size) { for (int i = 0; i < size; i++) cout << setw(2) << row[i]; } int main() { char array[][3] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L' }; print_row(array[2], sizeof(array[2]) / sizeof(char)); return 0; }	#include <iostream> #include <iomanip> using namespace std; int main() { char array[][4] = { 'A', 'B', 'C', '\0', 'D', 'E', 'F', '\0', 'G', 'H', 'I', '\0', 'J', 'K', 'L', '\0' }; cout << array[2] << endl; return 0; }
(a)	(b)
G H I	GHI
(c)	(d)

Extracting individual rows from two-dimensional arrays. When a program indexes into a two-dimensional array with a single index, the operation extracts a one-dimensional array or row from the original array. The programs demonstrate the technique's syntax and behavior.

The program defines a function with two parameters: The first parameter is a character pointer (blue), which is equivalent to a one-dimensional array of characters (see Passing arrays as function arguments). Although the function defines the parameter as a pointer, programs typically use the index operator, [], to access the individual elements (gold). Indexing a two-dimensional array with a single index (pink) extracts a single row. The sizeof expression calculates the number of elements in one row (see Counting array elements).
The second program modifies the first by removing the print_row function and adding a special character (green), called the null termination character, at the end of each row. Adding the null termination character enlarges the array to 4×4 and makes each row a string (described in more detail in the next chapter). Indexing the array with a single index (pink) extracts one row (i.e., one string), which the cout statement prints on the console.
Output from the first program: the elements of row 2.
The second program's output demonstrates the appearance of row 2 printed as a string of characters.

Command-Line Arguments

The picture illustrates an array of pointers, named argv, containing argc elements (i.e., with a length or size of argc). Each element of argv is a string implemented as an array of characters, and each string has a different length or size. — **Command-line arguments implemented as an array of strings**. When the operating system (OS) runs a program, it passes information through an established protocol. The protocol consists of two arguments passed from the OS to two parameters added to the program's `main` function. The OS establishes the protocol, organizing the information as an *explicit* array of character pointers whose elements are strings (i.e., arrays of characters). Programs must follow this protocol.

The first integer parameter is the array's size (the number of array elements). The second parameter is an array of character pointers, indicated by the red asterisks and square brackets. (The two versions are equivalent, but I think the first better signifies "an array of pointers.") Programmers traditionally name the parameters `argc` and `argv`, respectively. They can change the parameter names (but typically don't), but the protocol establishes the order and types.

Sometimes programs need to access individual characters in the command line arguments, which they can do with two indexes. `row` accesses an element in `argv` and `col` accesses a character in the corresponding string.

An abstract representation of command line arguments.

The next chapter formally introduces strings and command-line arguments.

Consistency With Java

Java is a pure object-oriented language representing fundamental and structured data differently. It represents fundamental data, int, double, etc., as simple bit patterns in memory. In contrast, it represents structured data, including arrays, as objects - instances of an unnamed class.

The illustration shows the variable 'array' pointing to an array consisting of three elements. Each element points to an array with two elements. — **Java arrays**. Java *automatically* implements a two-dimensional array as an array of arrays. Therefore, switching array indexes in a Java program results in a substantially different data organization in memory. Programmers implementing systems utilizing both languages or converting programs between languages face less confusion and make fewer errors when using the same index order.

The illustration shows the variable 'array' pointing to an array consisting of two elements. Each element points to an array with three elements. — **Java arrays**. Java *automatically* implements a two-dimensional array as an array of arrays. Therefore, switching array indexes in a Java program results in a substantially different data organization in memory. Programmers implementing systems utilizing both languages or converting programs between languages face less confusion and make fewer errors when using the same index order.

Index Order Summary

If used consistently, some programs function correctly with either index order. However, some features and the programs utilizing them require a rows x cols or [rows][cols] index order. Other systems require this order, making it customary for C++ programmers. So, throughout this textbook, rows first, followed by columns, is the "correct" order.

int main(int argc, char* argv[]) int main(int argc, char** argv)
(a)
argv[row][col]
(b)	(c)

`public int[][] array = new int[3][2];`	`public int[][] array = new int[2][3];`