8.8. Searching and Sorting

Large data processing applications often include two related sub-tasks: sorting the elements of an array (e.g., putting an array of names in alphabetical order) and searching for an element in an array (e.g., searching for a specific name in the array). These tasks are common but not so common that programming languages support them directly. However, most languages do provide library functions that perform these tasks. The selection sort and binary search algorithms presented in the last chapter demonstrated searching and sorting with character arrays. But the library functions must be more general - they must work with all data types.

We often study searching and sorting at the same time because many searching algorithms only work when the data is in sorted order and because both operations generally require an ordering function. For example, how the selection sort moves the array elements around to arrange them in sorted order is independent of the element's type. However, it chooses which elements to move by selecting the smallest value and swapping it with the element at the top, and this step is dependent on the element's type. When the array contains fundamental data (e.g., integers and doubles), programmers can use C++ relational operators (<, <=, >, and >=) to select the smallest value. But how do programmers implement that step when the array contains complex objects - instances of structures or classes?

Ordering Functions

Library programmers create the searching and sorting functions before application programmers create the programs that use them. Applications can specify new data types as structures or classes, but C++ doesn't include native operators that can directly compare objects (instances of structures or classes). How can library programmers make the searching and sorting functions sufficiently general to work with any object? Library programmers solve the problem by requiring application programmers to provide an ordering function.

Application programs pass ordering functions to the searching and sorting functions by pointer. As a consequence, the library functions control the signature and the overall behavior of the ordering functions. So, the library functions "expect" to pass arguments to the ordering functions by pointer. The ordering function's name is unimportant, but the arguments must be void pointers.

void pointers

A void pointer, void*, is C++'s most general data type. The compiler uses data types to allocate memory for data and to interpret its bits. As stored in pointers, addresses are always the same size, but the compiler still uses their type to interpret the data they point to. But the keyword void makes the data typeless - it has no specific data type. So, a void pointer is a bare address that can point to anything, but it must be cast to a known data type before the data can be extracted and used.

bsearch And qsort

The last chapter introduced to us the selection sort and the binary search algorithms. While selection sort works, it has a run order of O(n²), which is not very efficient - a small increase in the data size results in a large increase in run time. Quicksort is an efficient and well known algorithm. It usually has a run order of O(n log n), which means that it is usually quite efficient - a small increase in the data size results in a small increase in run time. Binary search is efficient with a run order of O(log n). Both algorithms are ideal candidates for inclusion as library functions, and C implemented them as the qsort and bsearch functions. Although the functions are not object-oriented, C++ inherits them, and C++ programmers may use both.

The complexity and arcane syntax of the library functions are due to a combination of two requirements. The first is the need to make the functions sufficiently general to operate with whatever kind of data the programmer needs to search or sort. Void pointers provide a general data type, but programmers must cast them to an established type before accessing data through them. Furthermore, the general nature of void pointers and the library functions require programmers to provide an ordering function.

The second requirement impacting the complexity and syntax of the searching and sorting functions is the need to pass and return data by pointer. qsort changes the data array by moving the elements around in the array, so the program must pass the data with an INOUT mechanism. C++ inherits qsort and bsearch from C, which does not have pass-by-reference, making pass-by-pointer the only available INOUT mechanism. The library functions pass elements from the data array to the ordering functions, which must dereference their pointer arguments to access the data. bsearch uses the same data arrays and ordering functions as qsort, so it too uses pass-by-pointer. The following examples demonstrate how to write and use ordering functions for different kinds of data.

Each example follows the same basic pattern. It creates an array of example data. It sorts the data and prints it, demonstrating the sorting. Lastly, it creates a key and searches the data array for it. You can modify one of the examples to satisfy most of your searching and sorting needs.

Searching and Sorting Fundamental Data

Working with fundamental or primitive data is relatively straightforward. The following example is based on an integer array but will work with any fundamental data type (e.g., char, double, long, etc.).

Searching and Sorting C-Strings

The C-string examples are special cases because we can't easily generalize them to other data types. Two similar representations are possible; the pictures help us understand their differences. The pictures form bridges that help us span the gulf between a problem and the details of the final program.

Searching and Sorting Objects

Searching for a given element in an array of integers or strings only tells us that the value is or isn't in the array. However, searching for a specific object in an array is more helpful. For example, suppose the array contains objects like those specified in the following figure. If the application searches for an object with the id 123, the search function will return a pointer. It returns nullptr if the array doesn't have an object with that id or a pointer to the object if it does. Instances of the student class have three fields, and if we find the object with the matching id, we retrieve all the information saved in that object. Searching for part of the data in an object to find all the data is called an associative search. Furthermore, we can sort and search the array using any field.

The following examples illustrate how to search for objects based on different fields or members. The figure presents demo4.cpp as a series of figures to simplify the discussion. The complete program, with all parts in context, is available at the bottom of the page.

struct student { char* name; int id; double gpa; };	struct student students[] = { { "Dilbert", 123, 3.5 }, { "Wally", 456, 2.0 }, { "Alice", 987, 3.9 }, { "Asok", 730, 3.8 }, { "Catbert", 501, 3.0 }, { "Pointy Haired Boss", 666, 1.0 }, { "Dogbert", 111, 4.0 } };
(a)	(b)

(c)

8.8. Searching and Sorting: `bsearch` And `qsort`

Ordering Functions

`void` pointers

`bsearch` And `qsort`

Searching and Sorting Fundamental Data

Searching and Sorting C-Strings

Searching and Sorting Objects

Downloadable Code

Ordering Functions

void pointers

bsearch And qsort

Searching and Sorting Fundamental Data

Searching and Sorting C-Strings

Searching and Sorting Objects

Downloadable Code

`void` pointers

`bsearch` And `qsort`