8.8. Searching and Sorting: bsearch And qsort

Review

Large data processing applications often include two related sub-tasks: sorting the elements of an array (e.g., putting an array of names in alphabetical order) and searching for an element in an array (e.g., searching for a specific name in the array). These tasks are common but not so common that programming languages support them directly. However, most languages do provide library functions that perform these tasks. The selection sort and binary search algorithms presented in the last chapter demonstrated searching and sorting with character arrays. But the library functions must be more general - they must work with all data types.

We often study searching and sorting at the same time because many searching algorithms only work when the data is in sorted order and because both operations generally require an ordering function. For example, how the selection sort moves the array elements around to arrange them in sorted order is independent of the element's type. However, it chooses which elements to move by selecting the smallest value and swapping it with the element at the top, and this step is dependent on the element's type. When the array contains fundamental data (e.g., integers and doubles), programmers can use C++ relational operators (<, <=, >, and >=) to select the smallest value. But how do programmers implement that step when the array contains complex objects - instances of structures or classes?

Ordering Functions

Library programmers create the searching and sorting functions before application programmers create the programs that use them. Applications can specify new data types as structures or classes, but C++ doesn't include native operators that can directly compare objects (instances of structures or classes). How can library programmers make the searching and sorting functions sufficiently general to work with any object? Library programmers solve the problem by requiring application programmers to provide an ordering function.

An illustration depicting the organization of a program. The application code defines the ordering function but calls the searching and sorting functions. The searching and sorting functions call the ordering function. The searching and sorting functions operate on arrays. They call the ordering function to compare array elements two at a time.
(a)(b)(c)
Ordering functions. Application programs can use the searching and sorting library functions to search and sort in an array of data. For the application to use the functions, it must also define an ordering function. The ordering function compares two elements from the data array and determines their relative order. The library functions can do the rest of the work with that information.
  1. Although the ordering function is part of the application program, the application does not call it directly. Instead, the application passes the ordering function's address to the library functions when it calls them. The library functions search or sort the data array by repeatedly calling the ordering function to determine the relative order of any two array elements.
  2. The ordering function compares and indicates the relative order of two elements from the data array. The data array elements may be fundamental data, int, double, etc., but are often objects - instances of structures or classes. When comparing and ordering objects, the ordering function must extract and compare the important data from each object. For example, a Person object may have many fields, including a name and an ID number, and the ordering function may operate on either or both.
  3. The ordering function conveys the relative order of any two array elements by its return value: if the first argument is less than (i.e., comes before) the second element, the ordering function returns a value less than 0; if the first argument is greater than (i.e., comes after) the second argument, the ordering function returns a value greater than 0; if the two arguments are equal (i.e., they order the same), the ordering function returns a zero.

Application programs pass ordering functions to the searching and sorting functions by pointer. As a consequence, the library functions control the signature and the overall behavior of the ordering functions. So, the library functions "expect" to pass arguments to the ordering functions by pointer. The ordering function's name is unimportant, but the arguments must be void pointers.

void pointers

A void pointer, void*, is C++'s most general data type. The compiler uses data types to allocate memory for data and to interpret its bits. As stored in pointers, addresses are always the same size, but the compiler still uses their type to interpret the data they point to. But the keyword void makes the data typeless - it has no specific data type. So, a void pointer is a bare address that can point to anything, but it must be cast to a known data type before the data can be extracted and used.

#include <iostream>
using namespace std;

struct student					// (a)
{
    char*   name;
    int     id;
};

void function1(void* arg)
{
    int data = * (int *)arg;			// (b)
    cout << data << endl;
}

void function2(void* arg)
{
    char* str = (char *)arg;			// (c)
    cout << str << endl;
}
void function3(void* arg)
{
    student* s = (student *)arg;		// (d)
    cout << s->name << " " << s->id << endl;
}


int main()
{
    int		x = 5;
    char*	c = "Hello, World!";		// (e)
    student	s = { "Alice", 123 };		// (f)

    function1(&x);
    function2(c);
    function3(&s);

    return 0;
}
 
Using void pointers. Before a program can access data through a void pointer, it must cast the pointer to an established data type. These functions are not ordering functions and they cannot be use with qsort or bsearch (described below). The examples demonstrate passing and casting void pointers. They also demonstrate that programmers can't write a general void pointer function, so they must tailor each function to a specific type.
  1. A structure used with the third example.
  2. arg is a void pointer pointing to an integer. Before function1 can access the integer, it must (1) cast the void pointer to an integer pointer, (int *), and then (2) dereference the pointer, *. You can edit this function to work with any fundamental data type.
  3. arg is still a void pointer, but this time, it points to a C-string. Extracting the C-string requires casting the pointer to a character-pointer: (char *). This function is a special case used only with C-strings
  4. In this example, arg points to an object instantiated from the student structure. The function first casts it to a student-pointer, (student *), and then each field is accessed with the arrow operator. Edit this function to work with any object, be it an instance of a structure or a class.
  5. Defines and initializes a C-string for the function2 call.
  6. Defines and initializes a student object for the function3 call.

bsearch And qsort

The last chapter introduced to us the selection sort and the binary search algorithms. While selection sort works, it has a run order of O(n2), which is not very efficient - a small increase in the data size results in a large increase in run time. Quicksort is an efficient and well known algorithm. It usually has a run order of O(n log n), which means that it is usually quite efficient - a small increase in the data size results in a small increase in run time. Binary search is efficient with a run order of O(log n). Both algorithms are ideal candidates for inclusion as library functions, and C implemented them as the qsort and bsearch functions. Although the functions are not object-oriented, C++ inherits them, and C++ programmers may use both.

void qsort(void* data, size_t num, size_t size, int (*order)(const void* e1, const void* e2));
void* bsearch(void* key, void* data, size_t num, size_t size, int (*order)(const void* e1, const void* e2));
Prototypes for the qsort and bsearch library functions. The standard C/C++ library includes functions implementing quicksort and binary search. The highlighted code represents a single function argument: a pointer to an ordering function. The functions are prototyped in the cstdlib (C++) and stdlib.h (C) header files.
void* data
The data array that the library functions search and sort.
size_t num
The number of array elements.
size_t size
The size, measured in bytes, of each array element.
int (*order)(const void* e1, const void* e2)
Oddly, this entire sequence defines a single function parameter named order. The red parentheses form the function-call operator. The grouping parentheses, highlighted in blue, are necessary because the function-call operator has a higher precedence than the pointer operator, *. int and const void* e1, const void* e2 require the ordering function to return an integer and accept two void pointer arguments. Programmers may replace order, e1, and e2 with appropriate names.

bsearch only

void* key
bsearch attempts to locate this data in the data array. Frequently, key is a partially filled object - just the field or fields needed for the search contain data. If the function locates the key, the application can access all data saved in the corresponding object.
return value
If key is found in data, bsearch returns a pointer to the matching array element; otherwise it returns nullptr. The function returns a void pointer, so the application must cast it to a known type before using the located data.

The complexity and arcane syntax of the library functions are due to a combination of two requirements. The first is the need to make the functions sufficiently general to operate with whatever kind of data the programmer needs to search or sort. Void pointers provide a general data type, but programmers must cast them to an established type before accessing data through them. Furthermore, the general nature of void pointers and the library functions require programmers to provide an ordering function.

The second requirement impacting the complexity and syntax of the searching and sorting functions is the need to pass and return data by pointer. qsort changes the data array by moving the elements around in the array, so the program must pass the data with an INOUT mechanism. C++ inherits qsort and bsearch from C, which does not have pass-by-reference, making pass-by-pointer the only available INOUT mechanism. The library functions pass elements from the data array to the ordering functions, which must dereference their pointer arguments to access the data. bsearch uses the same data arrays and ordering functions as qsort, so it too uses pass-by-pointer. The following examples demonstrate how to write and use ordering functions for different kinds of data.

Each example follows the same basic pattern. It creates an array of example data. It sorts the data and prints it, demonstrating the sorting. Lastly, it creates a key and searches the data array for it. You can modify one of the examples to satisfy most of your searching and sorting needs.

Searching and Sorting Fundamental Data

Working with fundamental or primitive data is relatively straightforward. The following example is based on an integer array but will work with any fundamental data type (e.g., char, double, long, etc.).

An array of unsorted integers.
An array of integers. An array of any simple or fundamental data looks similar.
/*
 * Sorts and then searches an array of integers.
 * Demonstrates how to write an ordering function 
 * that compares two integers.
 */

#include <iostream>
#include <cstdlib>
using namespace std;

int order_int(const void* e1, const void* e2);

int main()
{
	int data[10] = { 2, 5, 3, 9, 1, 4, 6, 8, 7, 0 };

	qsort(data, 10, sizeof(int), order_int);					// (a)

	for (int i = 0; i < 10; i++)
		cout << data[i] << endl;
		
	int key = 6;									// (b)
	int* found = (int *)bsearch(&key, data, 10, sizeof(int), order_int);		// (c)
	if (found != nullptr)
		cout << "key found " << *found << endl;
	else
		cout << "key not found\n";

	return 0;
}


int order_int(const void* e1, const void* e2)						// (d)
{
	return * (int *) e1 - * (int *) e2;
}
demo1.cpp. Using the qsort and bsearch library functions to sort and search an array of integers. Although it's rarely useful to search an array of numbers, searching an array of objects with numeric fields is common, and this example demonstrates an efficient way of comparing them.
  1. By itself, the name of a function is the function's address. So, the ordering function is passed-by-pointer to qsort and bsearch.
  2. A variable, whose address the function can find with the address-of operator, is needed for the key because it too is passed-by-pointer.
  3. The application must cast the void pointer bsearch returns to a known type before using it.
  4. The ordering function casts the void pointers e1 and e2 to integer pointers, dereferences them, and takes the difference between the two resulting integers. The difference indicates their relative order, e.g., 5 - 10 < 0, 5 - 5 == 0, and 10 - 5 > 0.

    While the difference between float and double values indicates their relative order, ordering functions must return an int value. Casting the difference to an int is insufficient because casting truncates and discards fractional values. For example, imagine that e1 = 0.8 and e2 = 0.4. So, e1 - e2 = 0.8 - 0.4 = 0.4. But (int)0.4 = 0, indicating that the elements are equal or at least that they have the same order. Please see Figure 13 for a working example.

Searching and Sorting C-Strings

The C-string examples are special cases because we can't easily generalize them to other data types. Two similar representations are possible; the pictures help us understand their differences. The pictures form bridges that help us span the gulf between a problem and the details of the final program.

An array of character-pointers. Each array element is a C-string character-pointer that points to a C-string. A two-dimensional character array. Each row is null-terminated, making it a C-string.
(a)(b)
C-string arrays. C-strings are character arrays that mark the string's end (i.e., the end of the character data) with a null-termination character: \0. The name of an array represents the array's address, so it's convenient to move C-strings around as character pointers.
  1. The figure illustrates the most common way of creating an array of C-strings. The vertical array, data, is an array of character-pointers (i.e., C-stings): char* data[]. So, when the ordering function casts the void pointers, it must cast them to double character pointers (i.e., pointers to pointers). demo2.cpp (Figure 7) demonstrates how to build and use this array. Although this is a special case, it demonstrates concepts applicable to objects that have C-string fields or members.
  2. data is a two-dimensional array of characters, char data[11][8], but the program initializes each row as a C-string. The array is a good example of a situation where it is not possible to reverse the rows and columns: [8][11] will NOT work. demo3.cpp (Figure 8) demonstrates how to build this array.
/*
 * Sorts an array of C-strings.
 * Demonstrates how to write an ordering function 
 * that compares two C-strings.
 */

#include <iostream>
#include <cstdlib>
#include <cstring>
using namespace std;

int order_string(const void* e1, const void* e2);

int main()
{
	char* data[] = { "see", "the", "quick", "red", "fox", "jump",
		    "over", "the", "lazy", "brown", "dog" };
			
	qsort(data, 11, sizeof(char*), order_string);					// (a)

	for (int i = 0; i < 11; i++)
		cout << data[i] << endl;

	char* key = "jump";								// (b)
	char** found = (char **)bsearch(&key, data, 11, sizeof(char*), order_string);	// (c)
	if (*found != nullptr)
		cout << "key found: " << *found << endl;
	else
		cout << "key not found\n";

	return 0;
}


int order_string(const void* e1, const void* e2)					// (d)
{
	return strcmp(*(char **) e1, *(char **) e2);
}
demo2.cpp. Using the qsort and bsearch library functions to sort and search an array of C-strings. C-strings are character-pointers: char*, making the void pointers pointers to pointers. So the casts require two asterisks: char **.
  1. Passing the ordering function to qsort and bsearch is straightforward: use its name as an argument.
  2. Creates a key for bsearch.
  3. bsearch returns a void pointer that is nullptr if key is not found, but if it is found, it points to a C-string (i.e., a char*).
  4. The ordering function casts the void pointers e1 and e2 to pointers to C-strings. However, e1 and e2 are elements from an array of pointers - each element is a pointer to a pointer - so the cast is to a double-pointer: char **. The red grouping parentheses are necessary because the dereference operator has higher precedence than the casting operator. order_string returns the value calculated and returned by the C-string ordering function, strcmp.
/*
 * Sorts a two-dimensional array of characters.
 * Demonstrates how to write an ordering function 
 * that compares two C-strings formed from a 2D array.
 */

#include <iostream>
#include <stdlib.h>
#include <cstring>
using namespace std;

int order_char(const void* e1, const void* e2);

int main()
{
	char data[11][8] = { "see", "the", "quick", "red", "fox", "jump",
		"over", "the", "lazy", "brown", "dog" };

	qsort(data, 11, 8, order_char);						// (a)

	for (int i = 0; i < 11; i++)
		cout << data[i] << endl;

	char* key = "lazy";							// (b)
	char* found = (char *)bsearch(key, data, 11, 8, order_char);		// (c)
	if (found != nullptr)
		cout << "key found: " << found << endl;
	else
		cout << "key not found\n";

	return 0;
}


int order_char(const void* e1, const void* e2)					// (d)
{
	return strcmp((char *) e1, (char *) e2);
}
demo3.cpp. Using the qsort library function to sort C-strings formed by rows in a two-dimensional character array. Carefully compare demo2 and demo3; see if you can spot the differences - when you can explain the differences, you will truly understand pointers and C-strings.
  1. Calling qsort library function and passing the array to sort, the number of elements in the array, the size of each array element, and a pointer to the ordering function.
  2. The key or C-string that bsearch searches for in the array.
  3. bsearch returns a void pointer that is cast to a char* or C-string.
  4. The ordering function casts the void pointers e1 and e2 to C-string pointers, but unlike the previous example, they are just C-strings (not pointers to C-strings). The ordering function returns the value returned by strcmp.

Searching and Sorting Objects

Searching for a given element in an array of integers or strings only tells us that the value is or isn't in the array. However, searching for a specific object in an array is more helpful. For example, suppose the array contains objects like those specified in the following figure. If the application searches for an object with the id 123, the search function will return a pointer. It returns nullptr if the array doesn't have an object with that id or a pointer to the object if it does. Instances of the student class have three fields, and if we find the object with the matching id, we retrieve all the information saved in that object. Searching for part of the data in an object to find all the data is called an associative search. Furthermore, we can sort and search the array using any field.

The following examples illustrate how to search for objects based on different fields or members. The figure presents demo4.cpp as a series of figures to simplify the discussion. The complete program, with all parts in context, is available at the bottom of the page.

struct student
{	char*	name;
	int	id;
	double	gpa;
};
struct student students[] = {
		{ "Dilbert", 123, 3.5 },
		{ "Wally", 456, 2.0 },
		{ "Alice", 987, 3.9 },
		{ "Asok", 730, 3.8 },
		{ "Catbert", 501, 3.0 },
		{ "Pointy Haired Boss", 666, 1.0 },
		{ "Dogbert", 111, 4.0 }
	};
(a)(b)
An array of student objects. Each object has three fields: name, id, and gpa. An ordering function can order the objects based on any one of the fields or a combination of the fields.
(c)
An array of structure objects.
  1. The structure specification.
  2. Defining and initializing the array of structures.
  3. An abstract representation of the array in memory. The array indexes are on the bottom row.

 

void print(student* data);			// Prints a single student structure to the console.
void print(int number, student* data);		// Prints the whole array of structures to the console.
int order_name(const void* e1, const void* e2);	// Orders the objects by name.
int order_id(const void* e1, const void* e2);	// Orders the objects by id.
int order_gpa(const void* e1, const void* e2);	// Orders the objects by gpa.
demo4.cpp prototypes.

 

qsort(students, 7, sizeof(student), order_name);					// (a)
print(7, students);

student key1 = { "Catbert", 0, 0 };							// (b)
student* found1 = (student *)bsearch(&key1, students, 7, sizeof(student), order_name);	// (c)
if (found1 != nullptr)									// (d)
	print(found1);
else
	cout << "key not found\n";
int order_name(const void* e1, const void* e2)						// (e)
{
	return strcmp(((student *)e1)->name, ((student *)e2)->name);			// (f)
}
Searching and sorting objects by "name".
  1. Sorts the array alphabetically by name.
  2. Creates a search key as a partially filled structure, using only the name field.
  3. Searches the sorted array for key1.
  4. bsearch returns nullptr if it doesn't find the name saved in key1, or a pointer to the matching object in the array if it does.
  5. An ordering function based on the student's name.
  6. The ordering function must first access or extract the necessary fields; once the fields are available, it then compares them using the strcmp library function. Two operations are needed to extract the name from the structure: (1) casting the void pointer to a student pointer and (2) accessing the name using the arrow operator. Note that the arrow operator has a higher precedence than the the casting operator, making the grouping parentheses appearing before the arrow operator necessary.

 

qsort(students, 7, sizeof(student), order_id);						// (a)
print(7, students);

student key2 = { "", 730, 0 };								// (b)
student* found2 = (student *)bsearch(&key2, students, 7, sizeof(student), order_id);	// (c)
if (found2 != nullptr)									// (d)
	print(found2);
else
	cout << "key not found\n";
int order_id(const void* e1, const void* e2)						// (e)
{
	return ((student *)e1)->id - ((student *)e2)->id;				// (f)
}
Searching and sorting objects by "id".
  1. Sorts the array numerically by id in ascending order (low to high).
  2. Creates a search key, which is a partially filled structure, only filling the id field.
  3. Searches the sorted array for key2.
  4. bsearch returns nullptr if the array doesn't have an object with id saved in key2, or a pointer to the matching object if it does.
  5. An ordering function based on the student's id.
  6. The ordering function must first access or extract the necessary fields; once the fields are available, it compares them by taking their difference. Two operations are needed to extract the id from the structure: (1) casting the void pointer to a student pointer and (2) accessing the id using the arrow operator. Note that the arrow operator has a higher precedence than the the casting operator, so the statement achieves the correct order of operation with the grouping parentheses appearing before the arrow operator.

 

qsort(students, 7, sizeof(student), order_gpa);						// (a)
print(7, students);

student key3 = { "", 0, 3.9 };								// (b)
student* found3 = (student *)bsearch(&key3, students, 7, sizeof(student), order_gpa);	// (c)
if (found3 != nullptr)									// (d)
	print(found3);
else
	cout << "key not found\n";
int order_gpa(const void* e1, const void* e2)						// (e)
{
	double diff = ((student *)e1)->gpa - ((student *)e2)->gpa;			// (f)
	if (diff < 0) return -1;							// (g)
	if (diff > 0) return 1;
	return 0;
}
Searching and sorting objects by "gpa".
  1. Sorts the array numerically by gpa in ascending order.
  2. Creates a search key by partially filling a structure, using only the gpa field.
  3. Searches the sorted array for key3. gpa is likely not a unique value and a binary search may return any object with a value matching the key.
  4. bsearch returns nullptr if key3 is not found, or a pointer to the matching object if it is.
  5. An ordering function based on the student's gpa.
  6. The ordering function must first access or extract the necessary fields; once the fields are available, it compares them. Two operations are needed to extract the name from the structure: (1) the void pointer is cast to a student pointer, and (2) the name is accessed using the arrow operator. The arrow operator has higher precedence than the casting operator, requiring the grouping parentheses operator before the arrow operator.
  7. gpa is type double, so the difference between the two gpa fields is a double-valued expression - but the function returns an integer, so, to avoid truncation errors, the difference is mapped to an appropriate integer value.

Downloadable Code