Large data processing applications often include two related sub-tasks: sorting the elements of an array (e.g., putting an array of names in alphabetical order) and searching for an element in an array (e.g., searching for a specific name in the array). These tasks are common but not so common that programming languages support them directly. However, most languages do provide library functions that perform these tasks. The selection sort and binary search algorithms presented in the last chapter demonstrated searching and sorting with character arrays. But the library functions must be more general - they must work with all data types.
We often study searching and sorting at the same time because many searching algorithms only work when the data is in sorted order and because both operations generally require an ordering function. For example, how the selection sort moves the array elements around to arrange them in sorted order is independent of the element's type. However, it chooses which elements to move by selecting the smallest value and swapping it with the element at the top, and this step is dependent on the element's type. When the array contains fundamental data (e.g., integers and doubles), programmers can use C++ relational operators (<, <=, >, and >=) to select the smallest value. But how do programmers implement that step when the array contains complex objects - instances of structures or classes?
Ordering Functions
Library programmers create the searching and sorting functions before application programmers create the programs that use them. Applications can specify new data types as structures or classes, but C++ doesn't include native operators that can directly compare objects (instances of structures or classes). How can library programmers make the searching and sorting functions sufficiently general to work with any object? Library programmers solve the problem by requiring application programmers to provide an ordering function.
(a)
(b)
(c)
Ordering functions. Application programs can use the searching and sorting library functions to search and sort in an array of data. For the application to use the functions, it must also define an ordering function. The ordering function compares two elements from the data array and determines their relative order. The library functions can do the rest of the work with that information.
Although the ordering function is part of the application program, the application does not call it directly. Instead, the application passes the ordering function's address to the library functions when it calls them. The library functions search or sort the data array by repeatedly calling the ordering function to determine the relative order of any two array elements.
The ordering function compares and indicates the relative order of two elements from the data array. The data array elements may be fundamental data, int, double, etc., but are often objects - instances of structures or classes. When comparing and ordering objects, the ordering function must extract and compare the important data from each object. For example, a Person object may have many fields, including a name and an ID number, and the ordering function may operate on either or both.
The ordering function conveys the relative order of any two array elements by its return value: if the first argument is less than (i.e., comes before) the second element, the ordering function returns a value less than 0; if the first argument is greater than (i.e., comes after) the second argument, the ordering function returns a value greater than 0; if the two arguments are equal (i.e., they order the same), the ordering function returns a zero.
Application programs pass ordering functions to the searching and sorting functions by pointer. As a consequence, the library functions control the signature and the overall behavior of the ordering functions. So, the library functions "expect" to pass arguments to the ordering functions by pointer. The ordering function's name is unimportant, but the arguments must be void pointers.
void pointers
A void pointer, void*, is C++'s most general data type. The compiler uses data types to allocate memory for data and to interpret its bits. As stored in pointers, addresses are always the same size, but the compiler still uses their type to interpret the data they point to. But the keyword void makes the data typeless - it has no specific data type. So, a void pointer is a bare address that can point to anything, but it must be cast to a known data type before the data can be extracted and used.
#include <iostream>
using namespace std;
struct student // (a)
{
char* name;
int id;
};
void function1(void* arg)
{
int data = * (int *)arg; // (b)
cout << data << endl;
}
void function2(void* arg)
{
char* str = (char *)arg; // (c)
cout << str << endl;
}
void function3(void* arg)
{
student* s = (student *)arg; // (d)
cout << s->name << " " << s->id << endl;
}
int main()
{
int x = 5;
char* c = "Hello, World!"; // (e)
student s = { "Alice", 123 }; // (f)
function1(&x);
function2(c);
function3(&s);
return 0;
}
Using void pointers. Before a program can access data through a void pointer, it must cast the pointer to an established data type. These functions are not ordering functions and they cannot be use with qsort or bsearch (described below). The examples demonstrate passing and casting void pointers. They also demonstrate that programmers can't write a general void pointer function, so they must tailor each function to a specific type.
A structure used with the third example.
arg is a void pointer pointing to an integer. Before function1 can access the integer, it must (1) cast the void pointer to an integer pointer, (int *), and then (2) dereference the pointer, *. You can edit this function to work with any fundamental data type.
arg is still a void pointer, but this time, it points to a C-string. Extracting the C-string requires casting the pointer to a character-pointer: (char *). This function is a special case used only with C-strings
In this example, arg points to an object instantiated from the student structure. The function first casts it to a student-pointer, (student *), and then each field is accessed with the arrow operator. Edit this function to work with any object, be it an instance of a structure or a class.
Defines and initializes a C-string for the function2 call.
Defines and initializes a student object for the function3 call.
bsearch And qsort
The last chapter introduced to us the selection sort and the binary search algorithms. While selection sort works, it has a run order of O(n2), which is not very efficient - a small increase in the data size results in a large increase in run time. Quicksort is an efficient and well known algorithm. It usually has a run order of O(n log n), which means that it is usually quite efficient - a small increase in the data size results in a small increase in run time. Binary search is efficient with a run order of O(log n). Both algorithms are ideal candidates for inclusion as library functions, and C implemented them as the qsort and bsearch functions. Although the functions are not object-oriented, C++ inherits them, and C++ programmers may use both.
The complexity and arcane syntax of the library functions are due to a combination of two requirements. The first is the need to make the functions sufficiently general to operate with whatever kind of data the programmer needs to search or sort. Void pointers provide a general data type, but programmers must cast them to an established type before accessing data through them. Furthermore, the general nature of void pointers and the library functions require programmers to provide an ordering function.
The second requirement impacting the complexity and syntax of the searching and sorting functions is the need to pass and return data by pointer. qsort changes the data array by moving the elements around in the array, so the program must pass the data with an INOUT mechanism. C++ inherits qsort and bsearch from C, which does not have pass-by-reference, making pass-by-pointer the only available INOUT mechanism. The library functions pass elements from the data array to the ordering functions, which must dereference their pointer arguments to access the data. bsearch uses the same data arrays and ordering functions as qsort, so it too uses pass-by-pointer. The following examples demonstrate how to write and use ordering functions for different kinds of data.
Each example follows the same basic pattern. It creates an array of example data. It sorts the data and prints it, demonstrating the sorting. Lastly, it creates a key and searches the data array for it. You can modify one of the examples to satisfy most of your searching and sorting needs.
Searching and Sorting Fundamental Data
Working with fundamental or primitive data is relatively straightforward. The following example is based on an integer array but will work with any fundamental data type (e.g., char, double, long, etc.).
Searching and Sorting C-Strings
The C-string examples are special cases because we can't easily generalize them to other data types. Two similar representations are possible; the pictures help us understand their differences. The pictures form bridges that help us span the gulf between a problem and the details of the final program.
(a)
(b)
C-string arrays. C-strings are character arrays that mark the string's end (i.e., the end of the character data) with a null-termination character: \0. The name of an array represents the array's address, so it's convenient to move C-strings around as character pointers.
The figure illustrates the most common way of creating an array of C-strings. The vertical array, data, is an array of character-pointers (i.e., C-stings): char* data[]. So, when the ordering function casts the void pointers, it must cast them to double character pointers (i.e., pointers to pointers). demo2.cpp (Figure 7) demonstrates how to build and use this array. Although this is a special case, it demonstrates concepts applicable to objects that have C-string fields or members.
data is a two-dimensional array of characters, char data[11][8], but the program initializes each row as a C-string. The array is a good example of a situation where it is not possible to reverse the rows and columns: [8][11] will NOT work. demo3.cpp (Figure 8) demonstrates how to build this array.
Searching and Sorting Objects
Searching for a given element in an array of integers or strings only tells us that the value is or isn't in the array. However, searching for a specific object in an array is more helpful. For example, suppose the array contains objects like those specified in the following figure. If the application searches for an object with the id 123, the search function will return a pointer. It returns nullptr if the array doesn't have an object with that id or a pointer to the object if it does. Instances of the student class have three fields, and if we find the object with the matching id, we retrieve all the information saved in that object. Searching for part of the data in an object to find all the data is called an associative search. Furthermore, we can sort and search the array using any field.
The following examples illustrate how to search for objects based on different fields or members. The figure presents demo4.cpp as a series of figures to simplify the discussion. The complete program, with all parts in context, is available at the bottom of the page.
struct student
{ char* name;
int id;
double gpa;
};