13.4. Binary Trees: Template Examples

Time: 00:04:34 | Download: Large, Large (CC), Small | Streaming, Streaming (CC) | Slides (PowerPoint)

Review

Defining pointer variables (Understanding pointer operator behavior (a))
Stack And Heap
A binary tree
Overloaded Operators
Naturally orderable types (min template functions)

Binary trees are one example of a dynamic data structure, so called because they are constructed from blocks of memory dynamically allocated on the heap with the new operator and because they are organized and held together by pointers, making it possible to organize and reorganize them dynamically (i.e., while the program is running). C++ calls dynamic data structures containers because it implements them as objects that contain other objects (much like a zip file is a file containing other files). While we can implement data structures in various ways, they usually support a few standard operations and sometimes a few optional ones:

create the data structure; done with a constructor in C++.
destroy the structure when it is no long needed; done with a destructor in C++.
insert a new data item in the structure. The operation's name varies (e.g., add, append, push, etc.), reflecting how the structure organizes the stored data. The operation may or may not allow duplicate data.
search for a given data item. This operation is also variously named: get, find, peek, etc. For many programs using a data structure, this operation is arguably the most important and the one the structure optimizes.
remove an existing data item from the structure.

Standard dynamic data structure operations. Although these operations are common across most data structures, their implementations vary substantially depending on how the structure organizes the stored data.

Binary trees (aka binary search trees) are one of many dynamic data structures. They are sufficiently complex to illustrate some advanced programming features while remaining simple enough for beginning computer scientists to follow their fundamental behavior. We use them here to demonstrate two versions of a data structure based on one and two template variables.

Associative Data Structures

Generally, data structures organize data independently of the data's type. So, the structure organizes integers the same way it organizes instances of a Person class. However, binary trees order the data they store, enabling fast searches but requiring the data to be relatively orderable. The following description and subsequent binary tree implementations assume that the data stored in the tree can be ordered and compared with the < and == operators, respectively. The fundamental types order naturally with the less-than operator - 5 < 10 - and compare with the equality operator - 10 == 10. However, programmers must overload these operators when storing objects in the, or create another ordering mechanism such as a comparator.

To lay a foundation for presenting the binary tree operations, we need to extend our computer science vocabulary, and revisiting some data structures we've studied previously is helpful. A stack typically only allows a program to access the data at a single position, the stack's top, making the insert (push), search (peek), and remove (pop) operations relatively fast, but also limiting the problems stacks can solve. Arrays are more flexible, allowing programs to access array elements with any index value in the range of 0 .. size-1, making the search operation fast. Inserting new data at the end of a partially filled array or removing data from the end are also fast operations but are slow and tedious operations at other positions (see Inserting data into an array). Linked lists improve the insert and remove operations but slow the search. Most linked lists access data by its position in the list. However, we saw with the CList that it is possible to access list data with a key, which is how binary trees operate.

Binary trees are self-organizing data structures, meaning that they manage their internal organization independent of the client program. For example, if the client enters the same data into two trees but in a different order, the trees may have distinctly different shapes that are beyond the client's control. Binary tree implementations typically make the pointers binding the tree private and don't provide any getters, completely isolating or hiding the data organization from the client and making data access by position impossible. Consequently, clients storing data in binary trees must access the data by a key value. Trees are an example of a sub-category of data structures called associative data structures - structures that associate one data value, the key with another. When a "natural" association exists between the key and the other value, we can implement an associative data structure with one template variable; otherwise, we need at least two variables.

Name	Address
Dilbert	225 Elm	...
Alice	256 N 400 W	...
Wally	718 Washington	...
Asok	633 Adams	...

Word
mirror
rabbit
cat
queen

Count
8
27
12
22

(a)

(b)

Visualizing template variables with tables. Although binary trees organize data differently than arrays, making them look distinctly different, we can use tables to help understand how template variables work when performing tree operations.

This example illustrates a structure storing Employee data. Each column represents one class member variable (the ellipses stand for any number of additional members), and each row represents one stored object. Name and address form part of the stored information describing an individual, making them "naturally" associated with that information. A client program could use name aa a search key without storing additional information in the tree. If a search finds a key, it returns the entire row, making all the associated information available to the client. We can solve this problem with a binary tree based on a single template variable.
Some solutions require mapping a key to a value where the only association between the two is a specific problem. For example, imagine the problem of counting all unique words in a book. The words form the keys, while the counts represent the data. Outside the problem, words and counts are not naturally associated. We can solve this problem by creating a class with two members, word and count, and storing instances of the new class in a binary. Alternatively, we can solve it with a binary tree using two template variables. We'll revisit this problem later in the chapter.

If we handcraft the search and remove functions for a specific problem, we can make the key any data type. However, a general, library-grade solution is more restrictive. Using a single template variable requires making the key the same type as the stored data. Typically, the client creates a partially filled object for the key, including only enough information to satisfy the equality operation. Imagine that a tree stores instances of an Employee class as illustrated in (a). An appropriate key object is an instance of Employee with only the Name variable filled. Searching for part of the stored data is called an associative search. We can relax the restriction on the key type by using two or more template variables.

Binary Tree Outline

A square divided into three parts represents an empty node. The data is at the top, and the two null pointers, 'left' and 'right,' are on the bottom. — **A binary tree and the operational pointers**. A single binary tree element, represented by the squares in the illustrations, is called a *node*. Computer scientists typically draw trees upside down and call the node at the top *root*. The root node may contain data or only serve as the tree's "handle" (i.e., the variable a client program defines to use the tree). The following descriptions and examples implement the latter version, simplifying some operations at the expense of others and implying that a logically "empty" tree consists of a single root node without data. The Greek letter lambda (λ) is an abbreviation for `nullptr`.

A basic template binary tree class with three member variables: `data` holds the data stored in the node, while the `left` and `right` pointers link to the node's subtrees, building and organizing the tree structure.

This figure and those following illustrate each tree node as a square with the `data` member at the top and the `left` and `right` pointers on the bottom. Arrows and λ characters indicate valid and null pointers.

This binary tree arbitrarily inserts the first data item to the root's `right` subtree and never uses its `left` pointer. The `search` and `insert` operations require a "key" data value provided by the client program. The `search` operation seeks the "key," and the `insert` operation attempts to insert it into the tree. As they descend the tree, both operations choose between the left and right subtrees based on the "key:" *if key < the data in the current node, go left; otherwise, go right*. In the illustrations, "A" precedes (comes before) "B," and "C" succeeds (comes after) "B." The `search` operation uses a single pointer, `bottom`, to indicate the current node. The `insert` operation uses two pointers, `top` and `bottom`, that are moved down the tree so that they are always one level apart. Using two pointers makes it convenient for the operations to access the members of both nodes.

A picture illustrating the relationship between the 'top' and 'bottom' pointers. The tree consists of four nodes: The tree's root node is at the top and doesn't store any data but serves as a handle for the tree. The 'top' pointer points to the single node at the next level down. This node is the first to contain data and is always the root's right subtree. The 'bottom' pointer points to one of the top's subtrees, depending on the relative order of the data stored in the nodes. — **A binary tree and the operational pointers**. A single binary tree element, represented by the squares in the illustrations, is called a *node*. Computer scientists typically draw trees upside down and call the node at the top *root*. The root node may contain data or only serve as the tree's "handle" (i.e., the variable a client program defines to use the tree). The following descriptions and examples implement the latter version, simplifying some operations at the expense of others and implying that a logically "empty" tree consists of a single root node without data. The Greek letter lambda (λ) is an abbreviation for `nullptr`.

A basic template binary tree class with three member variables: `data` holds the data stored in the node, while the `left` and `right` pointers link to the node's subtrees, building and organizing the tree structure.

This figure and those following illustrate each tree node as a square with the `data` member at the top and the `left` and `right` pointers on the bottom. Arrows and λ characters indicate valid and null pointers.

This binary tree arbitrarily inserts the first data item to the root's `right` subtree and never uses its `left` pointer. The `search` and `insert` operations require a "key" data value provided by the client program. The `search` operation seeks the "key," and the `insert` operation attempts to insert it into the tree. As they descend the tree, both operations choose between the left and right subtrees based on the "key:" *if key < the data in the current node, go left; otherwise, go right*. In the illustrations, "A" precedes (comes before) "B," and "C" succeeds (comes after) "B." The `search` operation uses a single pointer, `bottom`, to indicate the current node. The `insert` operation uses two pointers, `top` and `bottom`, that are moved down the tree so that they are always one level apart. Using two pointers makes it convenient for the operations to access the members of both nodes.

Binary Tree Algorithms

A C++ binary tree implements operations 1 and 2, create and destroy, with a constructor and destructor, respectively. These operations are algorithmically simple and illustrated with working code in the next sections. Operations 4, 5, and 6, insert, search, and remove, are algorithmically more complex and outlined in the following figures. The working examples in the following sections demonstrate some optional operations, including a list function.

The insertion operation begins with 'top' pointing to the highest or root node and 'down' pointing to the root's right subtree. Search is similar but only uses the 'bottom' pointer. The example assumes that the 'bottom' node has two subtrees - both the 'left' and 'right' pointers of the 'bottom' node are filled. The example further assumes that the 'left' and 'right' pointers of the nodes below 'bottom' are null. So, the tree has four nodes, three data nodes, and the root, arranged on three levels. — **Descending the tree: searching and inserting**. When a program searches for or inserts a node in a binary tree, it descends the tree from the top to the bottom. Searching requires one pointer, `bottom`, while insertion requires both `top` and `bottom`. As the operations descend the tree, they compare two data values, the `key` and the data saved in the `bottom` node, updating the pointers as they descend. If the `key` value is less than the `bottom` node value, the program follows the `left` subtree; otherwise, it follows the `right` subtree. Although `top` and `bottom` change as the functions descend the tree, `this` always points to the root node.

The `search` function begins by initializing the `bottom` pointer, while the `insert` function initializes both pointers as illustrated:
Tree<T>* top = this; Tree<T>* bottom = right;

The operations continue descending the tree and updating the pointers. The operations assume that the stored data supports the `==` and `<` operators, which is valid for the fundamental types but requires programmers to overload them for class types. This version of the `insert` operation does not permit duplicate `key` values in the tree.
while (bottom != nullptr) { if (bottom->data == key) // return the matching data already in the tree return &bottom->data; top = bottom; // insert only bottom = (top != this && key < bottom->data) ? bottom->left : bottom->right; // insert bottom = (key < bottom->data) ? bottom->left : bottom->right; // search }

If the `insert` operation reaches the bottom of the tree without finding a match, it creates a new node, stores the `key` data in it, and inserts it into the tree. The second statement assumes that the saved data supports the assignment operation, requiring programmers to overload the assignment operator for "complex" classes (see simple and complex classes). The last statement selects the left or right subtree for insertion. The conditional operator's first sub-expression compares the `key` to the current or `bottom` node. The operator's second and third expressions produce pointers to the left and right subtrees. Therefore, the conditional operator produces a valid l-value for the left-hand side of the assignment operator, setting the appropriate subtree to `bottom`.
bottom = new Tree; bottom->data = key; bottom = (top != this && key < bottom->data) ? bottom->left : bottom->right;

The program has descended one level in the tree, moving 'top' and 'bottom' down one level each. The program updates the pointers by setting 'top' to 'bottom' and 'bottom' to either the bottom's left or right subtree, depending on the relative values of the 'key' and bottom's data. In this picture, the program arbitrarily selects the right subtree for illustration. — **Descending the tree: searching and inserting**. When a program searches for or inserts a node in a binary tree, it descends the tree from the top to the bottom. Searching requires one pointer, `bottom`, while insertion requires both `top` and `bottom`. As the operations descend the tree, they compare two data values, the `key` and the data saved in the `bottom` node, updating the pointers as they descend. If the `key` value is less than the `bottom` node value, the program follows the `left` subtree; otherwise, it follows the `right` subtree. Although `top` and `bottom` change as the functions descend the tree, `this` always points to the root node.

The `search` function begins by initializing the `bottom` pointer, while the `insert` function initializes both pointers as illustrated:
Tree<T>* top = this; Tree<T>* bottom = right;

The operations continue descending the tree and updating the pointers. The operations assume that the stored data supports the `==` and `<` operators, which is valid for the fundamental types but requires programmers to overload them for class types. This version of the `insert` operation does not permit duplicate `key` values in the tree.
while (bottom != nullptr) { if (bottom->data == key) // return the matching data already in the tree return &bottom->data; top = bottom; // insert only bottom = (top != this && key < bottom->data) ? bottom->left : bottom->right; // insert bottom = (key < bottom->data) ? bottom->left : bottom->right; // search }

If the `insert` operation reaches the bottom of the tree without finding a match, it creates a new node, stores the `key` data in it, and inserts it into the tree. The second statement assumes that the saved data supports the assignment operation, requiring programmers to overload the assignment operator for "complex" classes (see simple and complex classes). The last statement selects the left or right subtree for insertion. The conditional operator's first sub-expression compares the `key` to the current or `bottom` node. The operator's second and third expressions produce pointers to the left and right subtrees. Therefore, the conditional operator produces a valid l-value for the left-hand side of the assignment operator, setting the appropriate subtree to `bottom`.
bottom = new Tree; bottom->data = key; bottom = (top != this && key < bottom->data) ? bottom->left : bottom->right;

Although the remove operation doesn't significantly contribute to the template demonstration, it does illustrate a situation programmers frequently encounter while implementing data structures: some operations are efficient and relatively straightforward while others are not. Searching for and inserting nodes in a binary tree are efficient and accomplished with a modest amount of code. Removing a node from a binary tree is neither efficient nor straightforward. Computer scientists typically decompose the removal operation into three distinct cases. In the first case, the node selected for removal is a leaf without subtrees. In the second case, the selected node has one subtree. Finally, the node has two subtrees in the third or last case.

The following figures focus on developing algorithms, leaving the coding to the implementation sections. The figures consistently use a set of features: First, the top and bottom pointers begin at the top of the tree and descend it as described above. Second, the dashed line suggests that the part of the tree above the top pointer doesn't affect the removal algorithm. Third, arrows and λ's represent significant pointers; empty subtree boxes may be null or point to subtrees without affecting the algorithm. Fourth, the figures color the node selected for removal red. Finally, single alphabetic characters represent the stored data, demonstrating a valid insertion order and labeling the nodes.