7.5. Arrays: C++ vs. Java

Time: 00:05:05 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PDF)

There are three behavioral differences between an array in C++ and a similar array in Java:

  1. A Java array is aware of its size while a C++ array is not
  2. Java detects and notifies a programmer when the program indexes an array out of bounds; C++ does not
  3. In C++, it is possible to create an array of fully constructed objects in a single operation; Java requires multiple operations to create an array of objects

Array Size

An array in Java is an object (i.e., an instance of an unnamed class), so, like any object, it may have instance fields. Every Java array has a field defined as public int length; that stores the size of (or the number of elements in) the array.

int[]	scores = new int[10];

for (int i = 0; i < scores.length; i++)
	. . . scores[i] . . .;
const int size = 10;
int scores[size];
for (int i = 0; i < size; i++)
	. . . scores[i] . . .;
JavaC++
Java arrays. Representing the size of an array with a variable makes it easier to maintain the code. If a programmer changes the "10" in either example, the size of the array and the associated code automatically change throughout the program without additional effort. Java arrays are objects that store their size as the instance variable length, but C++ requires a separate variable.
C++ arrays are not objects

An array in C++ is a primitive, unstructured data type - it is an address that is incapable of storing additional data. C++ programmers must maintain a separate, distinct variable to track the size of an array.

C++ arrays do NOT have a .length instance field.

Bounds Checking

Arrays, in both the C++ and Java programming languages, are zero-indexed, meaning that legal index values range from 0 to size - 1 (where size is the maximum number of elements in the array).

an array with 10 elements, indexed from 0 to 9.
Array indexes. The example illustrates an array with a size or length of 10. C++ arrays are zero-indexed, so the first element is at index location 0, and the last is at index 9. Using an index value less than 0 or greater than 9 with this array is an indexing-out-of-bounds error. C++ does not detect such errors, but the Java virtual machine (JVM) does and aborts the program by throwing an IndexOutOfBoundsException.

It is easier to get a correct array-based program running in Java than in C++, but Java's bounds checking does have some overhead. While employed in the computing industry, I had access to a FORTRAN compiler with the option of enabling or disabling bounds checking. Using an array-based benchmark called LINPACK, I was able to examine the impact of enabling bounds checking versus disabling the feature. Bounds checking doubled the executable file size and increased the run time by 50%. Most programs are less array-intensive than LINPACK and suffer less performance penalty. Nevertheless, C++ programs will always outperform similar Java programs in this regard.

#include <iostream>
using namespace std;

int main()
{
	int	array0[10] = {};	// initialize all to 0
	int	array1[10];
	int	array2[10] = {};	// initialize all to 0

	for (int i = 0; i <= 10; i++)	// index out of bounds
		array1[i] = 25;

	// either array0 or array2 is corrupted
	for (int i = 0; i < 10; i++)
		cout << "array0: " << array0[i] << endl;
	for (int i = 0; i < 10; i++)
		cout << "array2: " << array2[i] << endl;

	return 0;
}
array0: 0
array0: 0
array0: 0
array0: 0
array0: 0
array0: 0
array0: 0
array0: 0
array0: 0
array0: 0
array2: 25
array2: 0
array2: 0
array2: 0
array2: 0
array2: 0
array2: 0
array2: 0
array2: 0
array2: 0
(a)(b)
Indexing C++ arrays out of bounds. If you index a C++ array out of bounds, the program's behavior varies depending on how it uses the indexed value and where it is in memory. The program will crash if it attempts to access memory outside of the memory allocated by the operating system. Otherwise, if the program uses the indexed location as an r-value, the operation returns "garbage." If it uses it as an l-value, the operation may corrupt other data.
  1. The C++ program defines three adjacent arrays and deliberately indexes the middle array out of bounds (highlighted with red characters). Which adjacent array the error corrupts depends on the compiler and the system's architecture
  2. The program output: can you see the "proof" demonstrating the memory corruption?
I wrote this program in 1985 to settle a bet with a coworker who didn't believe a program would do this. I didn't collect on the bet and don't think I ever will.

Arrays Of Objects

Both C++ and Java allow programmers to create arrays of objects, but a Java program requires several steps, resulting in an organization with numerous pointers. (Technically, Java has references, but Java references differ little from C++ pointers.) The following picture illustrates how a Java program creates an array of Employee objects.

A pointer variable, emp, represented as a square, points to an array of five pointers. Each pointer in the array points to an Employee object, represented as a large square.
Creating an array of objects in Java. Employee names a class that must have a default constructor.
  1. Defines the array variable, which only allocates enough memory to hold an address.
  2. The array is created with the new operator, which returns an address or pointer. Programmers may replace the constant, 5, with a variable. We often program steps (a) and (b) as a single statement.
  3. Each object is created or instantiated one at a time with the new operator and its address stored in the array. This technique has the advantage that the program only creates and initializes the objects it needs, saving memory and construction time otherwise wasted building unneeded objects.

Where Java provides only one way of creating and organizing an array of objects, C++ provides four. Having multiple options makes C++ very flexible, but the cost is additional, and sometimes complex, notation as the following figure illustrates:

Employee emp[5];
An array, named emp, of five employee objects. The name emp is the address of the first object in the array.
(a)
Employee* emp;
emp = new Employee[5];
A pointer variable named emp that points to an array of five Employee objects.
(b)
Employee* emp[5];
for (int i = 0; i < 5; i++)
	emp[i] = new Employee;
An array of pointers named emp. Element of emp points to one Employee object.
(c)
Employee** emp;
emp = new Employee*[5];
for (int i = 0; i < 5; i++)
	emp[i] = new Employee;
A pointer variable named emp points to an array of pointers. Each pointer element in the array points to one Employee object.
(d)
Creating an array of objects in C++. Employee is a class name that must have a default constructor. Gold rectangles represent pointers, and blue rectangles represent Employee objects. Programmers may replace constant values displayed in red with variables but must specify the size of arrays allocated on the stack with compile-time constants.
  1. The program creates the array of objects on the stack and initializes each object with the default constructor.
  2. The program creates the pointer variable emp on the stack as an automatic or local variable; it creates the objects on the heap as a single dynamic array and initializes each with the default constructor. Unlike (a), programmers can vary the array's size with a variable.
  3. The program creates an array of pointers on the stack but creates individual objects on the heap. Programmers may call any constructor, which only runs when the program instantiates an object.
  4. The program creates the pointer variable emp on the stack and the array of pointers and the individual objects on the heap. Programmers may call any constructor, which only runs when the program instantiates an object. Unlike (c), programmers can vary the pointer array's size with a variable.