9.15.4.1. Length-Prefixed String Example

Review

Non-object-oriented programming languages must represent strings without relying on classes. Structures or their equivalent are a logical replacement, and we'll explore such an implementation in the next section. But we can encode a string as a single-character array if we're clever. We have seen that the C-programming language accomplished this by adding a sentinel or null-termination character at the end of the textual data. This section explores another single-array implementation called length-prefixed or size-prefixed strings. An early implementation of the Pascal programming language from the University of California San Diego, UCSD Pascal, based its string type on this scheme. So many programmers also call length-prefixed strings UCSD strings.

Programmers implement length-prefixed strings as fixed-length arrays 256 characters long and store the string's length in the first or 0-th character. For simplicity, we use an 8-bit or 1-byte unsigned character for our character array. An 8-bit character is an integer that can store 256 distinct values. The character type is frequently signed on modern computer hardware, giving it a range of [-128 - 127]. Alternatively, an unsigned character has a range of [0 ‑ 255] and is how we'll implement the class. As we reserve the first array character for the string's length, 255 characters remain to store the string's textual data. This approach has some disadvantages: it wastes space when strings are short and doesn't allow strings longer than 255 characters, but it is simple enough that programming languages can support it as a fundamental or built-in data type.

A picture of an empty length-prefixed string implemented as an array of 256 characters. The length of the array, 0, is stored in the first or 0-th character. The string's capacity is 255 - it can save 255 characters. — **Length-prefixed string**. The length-prefixed implementation of the general string API captures the three string elements:

The textual data is stored in the elements of an array, named `text`, allocated automatically on the stack

The string's length is stored in `text[0]`, coming before or prefixing the textual data - hence the name *length-prefixed string*

The string's capacity is implicit in the implementation as a fixed-length array

An empty string has a length of 0, which the string saves in the first array element: `text[0]`. The remaining array elements, `text[1]` through `text[255]` are *logically* empty. *Physically*, every memory location always has some content - a random value leftover from the computer startup or the last program - but the string functions ignore the values in the elements beyond the string's length. Creating the string establishes its capacity of 255, and it never changes. Notice that the *string's* capacity is one less than the *array's* capacity of 256.

In this example, element 0 is 5, the string's length. The string's content, saved in elements 1 through 5, is "Hello." Elements 6 through 255 are empty.

These are not C-strings, so they are not zero-index or null-terminated. Furthermore, they are not dynamic - their length can vary, but their capacity cannot.

A picture of a length-prefixed string implemented a 256-character array. In the example, the string's length is 5 and is stored in text[0]. The string stores the word 'Hello': text[1] = H, text[2] = e, text[3] = l, text[4] = l, and text[5] = o. Characters 6 through 255 are empty. — **Length-prefixed string**. The length-prefixed implementation of the general string API captures the three string elements:

The textual data is stored in the elements of an array, named `text`, allocated automatically on the stack

The string's length is stored in `text[0]`, coming before or prefixing the textual data - hence the name *length-prefixed string*

The string's capacity is implicit in the implementation as a fixed-length array

An empty string has a length of 0, which the string saves in the first array element: `text[0]`. The remaining array elements, `text[1]` through `text[255]` are *logically* empty. *Physically*, every memory location always has some content - a random value leftover from the computer startup or the last program - but the string functions ignore the values in the elements beyond the string's length. Creating the string establishes its capacity of 255, and it never changes. Notice that the *string's* capacity is one less than the *array's* capacity of 256.

In this example, element 0 is 5, the string's length. The string's content, saved in elements 1 through 5, is "Hello." Elements 6 through 255 are empty.

These are not C-strings, so they are not zero-index or null-terminated. Furthermore, they are not dynamic - their length can vary, but their capacity cannot.

An Empty String	String With Content

(a)	(b)

Length-prefixed strings are simple enough to implement without making them a class. Nevertheless, they are a good class example, demonstrating how pictures can help us see and program problem details. The following figures detail the class and each function. The complete source code for the example is available for download at the bottom of the page.

#pragma once

class LPString
{
	public:
		const static int LENGTH = 256;

	private:
		unsigned char text[LENGTH];

	public:
		// constructors
		LPString() { text[0] = 0; }		// default constructor
		LPString(const char* s);		// conversion constructor: C-string to LPString
		LPString(char c);			// conversion constructor: char to LPString
		LPString(const LPString& s);		// copy constructor

		// access
		int length() const { return text[0]; }
		unsigned char& at(int index);

		// i/o
		static void print(const char*);
		void print() const;
		void println() const;
		void readln();

		// modify "this"
		void append(const LPString& s);
		void insert(const LPString& s, int index);
		void clear() { text[0] = 0; }

		// new LPString
		LPString copy() const;
		LPString concat(const LPString& s) const;
		LPString substring(int index, int length) const;

		// ordering
		bool equals(const LPString& s) const;
		int order(const LPString& s) const;
};

The LPString class. Following the program organization introduced earlier, we write the LPString class specification in the LPString.h header file. The functions included in the specification reflect the abstract operations identified in the previous section. However, the class specification adds C++ implementation details such as the "const" keyword and the ampersand denoting pass-by-reference. And it replaces the generic name "string" with the more specific "LPString" class name.

A previous chapter demonstrated that arrays are always passed to and returned from functions by pointer. However, wrapping the array in a class, even though the array is the class's only data member, changes the basic passing mechanism. Programs can pass objects by value, reference, or pointer regardless of their contents.

Even a brief examination of a modern programming language's string API will reveal essential string operations that LPString omits. Its inability to print numerical values or directly print characters will hamper our efforts to test and validate the member functions. Furthermore, an authentic class typically uses low-level operating system services to complete the I/O operations. For simplicity, LPString will instead use the <iostream> functions. Nevertheless, the class is sufficient for our instructional needs.

I recommend a stepwise or cyclic approach to class implementation. After programming each function, or at most each small group of related functions, pause to test and verify the additions. Verifying the code this way makes finding and correcting syntax errors easier, making the task less frustrating. It will also help make the overall debugging and validation process more manageable. And finally, some member functions often depend on other members; if the independent functions are validated, any errors are more likely in the dependent functions. Testing and validation typically require a certain "critical mass" of functions. Specifically, we need a constructor and a display or print function, so we begin with these.

Constructors

Illustrates an empty LPString as having a 0 in the first or 0-th element of its 'text' array, and the remaining elements are blank. — **The `LPString` default constructor**. The default constructor creates a *logically* "empty" LPString object.

The constructor creates the array automatically on the stack with a predetermined capacity. The picture clearly shows the constructor's only task: initializing the array's length to 0. The random values in elements 1 through 255 are irrelevant.

An initializer list can initialize a member variable but not *part* of a variable (i.e., not one array element). So, the constructor cannot use a list, and we write the function with a "regular" body. The complete function consists of one statement, making it an ideal candidate for implementation as an inline function in the class specification.

Following the cyclic approach described above, the initial test creates an object with the most basic constructor, the default. The also uses the `length`, `println`, and `print(char*)` functions, which are detailed below.

The picture represents an LPString object and a C-string as rectangles denoting arrays. The LPString rectangle is the 'text' member variable. The picture shows that i, the loop control variable, ranges from 1 to 5 while copying 'Hello' from the C-string to the LPString. — **The `LPString(char*)` conversion constructor**. The constructor converts C-string, `s`, to an `LPString` object by copying the characters one at a time. Correctly indexing the arrays and controlling the for-loop are challenging sub-problems, and this example demonstrates how a picture can help us solve them.

The `LPString` is initially empty as illustrated in Figure 1(a). The function copies the characters from `s` to the `LPString's` `text` array with a for-loop. But where does the loop begin and end, and how do we index into the arrays? (Equivalently, what values does the loop-control variable take, and how do we use the variable to index the arrays?)

Try mapping parts of the picture to corresponding parts of the C++ code:

The function uses the `LPString's` length, `text[0]`, as an accumulator to count the characters as it copies them. The function must initialize the length to 0 before looping and must increment the count during each iteration.

C-strings are zero-indexed and the copy operation begins at `s[0]`, so we initialize the loop control variable to 0 . However, `text[0]` is the string's length, and the characters begin at `text[1]`. This organization makes the indexes off by one throughout the copy operation. The assignment operation accounts for the offset by adding 1 to the loop control variable when indexing `text`.

Two situations can end the loop. The null termination character is a character 0, which C++ treats as `false`. If `s` is short, < 255 characters, the sub-expression `s[i]` ends the loop (when loop reaches `s[5]` in this example). If `s` is long, >= 255 characters, the sub-expression `i < LENGTH - 1` ends the loop. The -1 is necessary to prevent indexing `text` out of bounds.

Pictures don't need to be elaborate to be helpful - simple characters are often sufficient. This picture shows how `text` begins and changes with each loop iteration.

It's necessary to test strings with a length greater than 127. The test-and-validation code uses a "trick" inherited from C to create a long string: the compiler automatically concatenates adjacent C-strings to form a single string. The test code prints the newly created `LPString's` length and content, verifying that the class works with long strings.

An LPString object represented as a rectangle denoting an array named 'text.' A single character, c, is represented as a square containing the character 'X.' The constructor copies 'X' from the character to text[1] and initializes the LPString's length, text[0], to 1. — **The `LPString(char)` conversion constructor**. The constructor converts a single character, `c`, to an `LPString` object.

The picture illustrates making an `LPString` string by copying a character to it and setting its length to 1.

The constructor converts a character to an `LPString` by copying the character, `c`, to `text[1]` and initializing the string's length to 1: `text[0] = 1`.

The test validates the construction by printing the string's length and content.

Two LPString objects represented as rectangles denoting the LPString member variable '&text.' The picture shows that i, the loop control variable, ranges from 0 to 5 while copying the length, 5, and the content, 'Hello' from the existing LPString, s, to the new LPString. — **The `LPString` copy constructor**. The copy constructor creates a new `LPString` object by copying an existing one. The picture of the problem and the function code are similar to the `char*` conversion constructor (Figure 4).

The picture helps us identify details leading to a compact and efficient solution. The function must copy length+1 characters from the original or parameter `LPString` to the new one. The loop must begin at 0 and iterate the original string's length plus one.

A single for-loop copies the used elements of the existing `LPString` (the length in element 0 and the characters in elements 1 through 5) to the new string. The contents of the unused elements are irrelevant, so the function does not copy them. So, the for-loop begins at 0 and uses `<=` for control.

The test and validation code uses `lps2` created in Figure 4.

LPString Access Functions

We could choose to name the LPString access functions with the "get" and "set" prefixes like other access functions. However, looking at the C++ and Java string libraries or APIs, these functions don't typically follow that naming convention. So, we choose instead to follow the conventions of the other languages.

A fully populated LPString containing the characters 'Hello world' in elements 1 through 11. The string's length, 11, is maintained in element 0 and indicated by an arrow. — **The `LPString length` function**.

The picture illustrates the relationship between the saved textual data, "Hello world," and the string's length, 11. It also emphasizes that the string's length - the number of characters currently stored in the string - is *always* saved in the first array element: `text[0]`.

The `length` function is a "getter," but most string classes name it either `length` or `size`, and many provide both functions. The function is short, so we inline it in the class specification. We validated it above in conjunction with the constructors.

A fully populated LPString containing the characters 'Hello world' in elements 1 through 11 and the length, 11, in element 0. The variable 'index,' corresponding to the parameter in the at function, points to index location 7 in the text array. — **The `LPString at` function**. Surprisingly, the `at` function implements both "getter" and "setter" operations - it can get or set the character at the `index` location. Returning a reference (the red ampersand) allows programs to use the function as an l- and an r-value, performing both operations. (The compiler treats it as a value or address, depending on where the program uses it.)

A picture helps us see the relationship between the text saved in the string and each character's index location. We need to clarify the character indexing because we sometimes use a zero-indexed organization, and sometimes we don't. The arrow points to the 'w' at index location 7, which we use in the test and validation code.

The `at` function returns a reference to one element, a variable, in `text`. The if-statement verifies that the index is valid or in-bounds (i.e., within the string) and throws an exception if it is not.

The test and validation code for the `at` function demonstrates some vital syntax and one obscure conversion.

The function call `lps1.at(7)` gets one character element or variable from `lps1`. As used in the two illustrated statements, the compiler treats the element as an r-value or the character stored in the variable. The second example, with the obscure conversion, might be confusing. None of the overloaded print functions can print a single character, but we get around the limitation by calling a conversion constructor. `lps1.at(7)` returns a character, which is passed to the `LPString(char)` constructor. The constructor call creates a new, anonymous object, and the object calls `println`, which *can* print an `LPString`.

In the statement `lps1.at(7) = 'X'` the `at` functional call again gets the element or variable from `lps1` at index location 7. But in this statement, the call is on the left side of the assignment operator, so the compiler treats it as an address and saves the character 'X' in that memory location.

The statement `lps1.at(12)` indexes the string out of bounds - that is, one position beyond the last character - and causes the function to throw an exception.

Together, the `try` and `catch` blocks detect and handle the index-out-of-bounds exception.

I/O Functions

A rectangle, divided into boxes, representing a C-string. The boxes, left to right, save the characters 'Testing\0' where 'T' is in line[0] and '\0' is in line[7]. — **The `LPString` static `print` function**. The `static` version of the `print` function is a special case: it allows us to print C-strings with the `LPString` class. We *could* continue using the `<iostream>` functions to complete this task, but including it in `LPString` provides us with another opportunity to demonstrate `static` or class functions.

The picture reminds us that `line` is a C-string, so it is zero-indexed and null-terminated.

To demonstrate the placement of the "static" keyword, we prototype the `print` function in the class specification, which is in the `LPString.h` header file.

Continuing the demonstration, we place the function definition in the `LPString.cpp` source code file. Notice that we don't need the "static" keyword here.

The `static print` function "belongs" to the class rather than to an object or instance of the class. So, when a program calls the function, it must use the class name and the scope resolution operator, `::`.

An LPString with the text'Hello world' in text[1] through text[11] and the string's length in text[0]. — **The `LPString print` and `println` member functions**. The `print` and `println` functions are named the same as the corresponding Pascal and Java functions. The `print` function prints a string to the console without a trailing new-line character, while `println` prints the string followed by a new-line character. To prevent duplicating code, `println` calls `print` and then adds the new-line character.

Admittedly, the picture would be more useful if we followed a more authentic implementation based on lower-level operations or system calls. Still, the function uses a for-loop to print the characters one at a time, and the picture helps us configure the loop.

The `text` array is not null-terminated, so we can't print it with a single C-string operation. Using the information organized in the picture, we configure the for-loop controls: the loop starts at 1, uses less than or equals for the test, and compares the loop-control variable with the value saved in `text[0]`.

It is generally good practice to avoid duplicating code whenever feasible, so the `println` calls `print` and then adds the new-line character.

Function validation is straightforward.

Two pictures of an LPString. The first string is logically empty because text[0] is 0. The LPString functions ignore the characters in text[1] through text[5]. The second picture illustrates characters read from cin, saved in variable c, and then copied into the string's text array. The function discards the new-line character, i.e., it doesn't save the character in the text array. — **The `LPString readln` function**. String-input functions typically allow users to backspace and reenter characters before signaling the program to read the string by pressing the `Enter` key. Pressing the enter key also inserts a new-line character at the input stream. The Java and Pascal `readln` functions read the new-line character but discard it (i.e., they do not include it in the string). We'll read the string one character at a time, allowing us to locate the new-line character. We'll use the `get` function (see the wc.cpp example) in place of lower-level operations, to read the characters.

String input functions typically discard or overwrite a string's contents. Accordingly, the `LPString` must be empty before the reading operation begins. If the string is new, as in Figure 1(a), it's ready for the operation. However, if the string contains text, as in Figure 1(b), the function must discard the character data before reading. The top string (1) shows what Figure 1(b) looks like after the function empties it - the length is 0, and the function ignores the remaining characters: "Hello." The second string (2) illustrates characters as the `get` function reads them from `cin` and saves them in `c`. The while-loop copies the characters to `text`.

A simple picture illustrates how the string changes during each loop iteration. The brown box represents the space character.

Although the `readln` function is short, it involves several intricate steps:

The statement `text[0] = 0` empties the string - array (1). Now, the function can use `array[0]` as an accumulator to count the characters as the loop adds them to the string.

The `get` function reads characters one at a time from `cin` and temporarily saves them in `c` (the pair of red parentheses force the `get` function call and the assignment operation to take place first).

The loop runs while the input is not the new-line character, and there is space in `text` for additional characters. The "-1" is necessary because although the array has 256 elements, the string only uses 255 to store characters.

The expression `++text[0]` first increments the string's length and then uses it as an index into the string.

Algorithmic Functions

Algorithmic functions manipulate, modify, and otherwise use LPStrings to solve client program problems. For organizational convenience, we'll group these functions into three sub-categories:

Functions that modify this object. The functions in this category follow the general pattern: void a.function(b), where a is an LPString object and b represents 0 or more parameters of various types. The functions change a, reflecting the function's results.
Functions that create a new LPString object. These functions have the general pattern: LPString a.function(), where a is an LPString object and b is 0 or more parameters. The functions in this group do not alter a or b but return a new LPString object representing the function's operation.
Functions that compare two LPStrings. The final group has follows two similar patterns: bool a.function(b) or int a.function(b), where a and b are LPStrings.

Functions Modifying `this` Object

A picture of the LPString 'Hello' saved as text[0] = 5, and the characters in text[1] through text[5]. — **The `LPString clear` function**. The clear function is trivial, and the text explains the concepts justifying its operation above.

An `LPString` before the `clear` function operation.

The string after the `clear` function operation.

The `clear` function inlined in the class specification.

A simple test statement. See the `append` function below for the full context of the test.

A picture of the same LPString after being cleared: text[0] = 0 but text[1] through text[5] still contains 'Hello,' but the functions ignore the characters — **The `LPString clear` function**. The clear function is trivial, and the text explains the concepts justifying its operation above.

An `LPString` before the `clear` function operation.

The string after the `clear` function operation.

The `clear` function inlined in the class specification.

A simple test statement. See the `append` function below for the full context of the test.

A picture showing two LPStrings, this and s as follows:
this[0] = 5
this[1] = 'H'
this[2] = 'e'
this[3] = 'l'
this[4] = 'l'
this[5] = 'o'
and
s.text[0] = 6
s.text[1] = ' '
s.text[2] = 'w'
s.text[3] = 'o'
s.text[4] = 'r'
s.text[5] = 'l'
s.text[6] = 'd'
The function must copy s.text[1] to this[6], s.text[2] to this[7], and so on to s.text[6] to this[11]. — **The `LPString append` function**. The append function adds or appends characters at the end of `this` `LPString`. The function uses a for-loop to copy each character, and correctly indexing each string with the loop-control variable is the most challenging part of the function. An ancillary problem is distinguishing the strings' lengths, which is necessary to control the for-loop and index the strings. The picture helps us see how the function must index the strings and drive the loop.

Appends the parameter `s` to the end of `this` `LPString` by copying the parameter characters one at a time. We can use the same variable to index both strings if we use a constant offset when indexing `this` string. The offset is the length of `this` string. The loop copies the characters from `s` to `this` string. The for-loop runs from 1, the index location of the first character in `s`, to the length of `s`, saved in `s.text[0]`.

A dynamic, step-by-step picture of the copy operation. The final picture details the `this` string after the function finishes. The brown boxes represent the space character.

The function begins by verifying that there is enough space in `this` string to complete the append operation and throws an exception if there isn't. The for-loop carries out the copy operation outlined in the picture. When the loop finishes, the function updates the length of `this` string. Notice that the function does not increment the length of `this` string because doing so would "break" the constant offset needed for offsetting the `this` string index.

We divide the test and validation code into two groups. The first group is straightforward: it appends the function argument to `this` string. However, the second group relies on an unexpected C++ operation. The `at` function test-and-validation code (Figure 8) employed an obscure - in the sense that it's hard to see - conversion operation. This example goes a step further and uses two "hidden" conversions. While the `LPString` class *does not* have overloaded append functions that accept a character or a C-string, it *does* have constructors that do. So, the C++ compiler automatically converts 'o' and " world" into anonymous `LPString` objects and then uses them to complete the append operations. The compiler will only perform one level of conversion: it won't automatically convert `x` to `y` and then convert `y` to `z`.

A picture showing the completed append operation: text[0] = 11, and text[1] through text[11] = 'Hello world.' — **The `LPString append` function**. The append function adds or appends characters at the end of `this` `LPString`. The function uses a for-loop to copy each character, and correctly indexing each string with the loop-control variable is the most challenging part of the function. An ancillary problem is distinguishing the strings' lengths, which is necessary to control the for-loop and index the strings. The picture helps us see how the function must index the strings and drive the loop.

Appends the parameter `s` to the end of `this` `LPString` by copying the parameter characters one at a time. We can use the same variable to index both strings if we use a constant offset when indexing `this` string. The offset is the length of `this` string. The loop copies the characters from `s` to `this` string. The for-loop runs from 1, the index location of the first character in `s`, to the length of `s`, saved in `s.text[0]`.

A dynamic, step-by-step picture of the copy operation. The final picture details the `this` string after the function finishes. The brown boxes represent the space character.

The function begins by verifying that there is enough space in `this` string to complete the append operation and throws an exception if there isn't. The for-loop carries out the copy operation outlined in the picture. When the loop finishes, the function updates the length of `this` string. Notice that the function does not increment the length of `this` string because doing so would "break" the constant offset needed for offsetting the `this` string index.

We divide the test and validation code into two groups. The first group is straightforward: it appends the function argument to `this` string. However, the second group relies on an unexpected C++ operation. The `at` function test-and-validation code (Figure 8) employed an obscure - in the sense that it's hard to see - conversion operation. This example goes a step further and uses two "hidden" conversions. While the `LPString` class *does not* have overloaded append functions that accept a character or a C-string, it *does* have constructors that do. So, the C++ compiler automatically converts 'o' and " world" into anonymous `LPString` objects and then uses them to complete the append operations. The compiler will only perform one level of conversion: it won't automatically convert `x` to `y` and then convert `y` to `z`.

Functions Creating A New `LPString` Object

Two LPString objects, 'this' and 'local,' are represented as rectangles denoting their 'text' member variable arrays. The 'this' text array saves the string 'Hello' in elements 1 through 5 and the string's length, 5, in element 0. The picture shows that i, the loop control variable, ranges from 0 to 5 while copying 'this' object to the local variable named 'local.' The for-loop copies the elements 'this' to 'local' one at a time. — **The `LPString copy` function**. The `copy` function is *very* similar to the copy constructor, and you could argue that the copy constructor makes the `copy` function redundant. Nevertheless, the class includes it as a simple example of a function that returns an object.

The picture suggests that the function must copy the elements of `this` object to another object. Unlike the previous functions, the function's signature or prototype doesn't provide another object. So, the function creates a temporary, local object and copies `this` object to it. Following the copy operation, the function returns the local object.

The function creates a local, and initially empty, object `LPString` object named `local` with the default constructor. A single for-loop copies the elements of `this` string to `local` string. The `return` operator returns `local` by value (i.e., by copy).

Calling the `copy` function and validating the returned value is straightforward.

The picture shows 'this' string. The stores it length, 11, in [0]. The characters 'Hello world' occupy [1] through [11]. — **The `LPString substring` function**. The `substring` function extracts and copies part of an `LPString` object, creating a new `LPString` that stores the extracted substring. The function has two arguments: `index` is the starting location of the copy, and `length` is the substring's length. The example assumes that the substring function is called with `index` = 7 and `length` = 5. The function creates a local temporary variable, `local`, to hold the copy until the function returns it.

The relationships between `this` string and the function parameters `index` and `length`.

The string the `substring` function returns.

The `substring` function verifies that the starting location, `index`, is valid (i.e., inbounds). It also verifies that the sum of the substring starting location and length doesn't index the `this` string out of bounds. If either test fails, the function throws an exception using the `throw` keyword. The function creates an empty `LPString` named `local`, initializes its length to the substring's length, and copies the substring characters from `this` string to `local` one at a time. When the for-loop finishes copying the characters, the function returns `local`, containing the extracted characters.

The tables can help understand how the program uses the loop control variable to index into the string arrays. Unlike many for-loops in the previous problems, we begin this loop at 0 and use a strict less-than test to drive it (highlighted in yellow). We adjust the range of the loop control variable by adding 1 to it when we index into the local string's `text` array (highlighted in light blue). We use the sum of the loop control variable and `index`, the substring starting location, to index into `this` string (highlighted in coral). We *could* start the for-loop at 1, use `<=`, simplify the indexing into `local`, and compensate by changing the `this` string indexing: `text[index + i - 1]`.

The picture shows the 'text' array of the 'local' string. The string's length, 5, is saved in [0] and the sub-string 'world' in [0] through [5]. — **The `LPString substring` function**. The `substring` function extracts and copies part of an `LPString` object, creating a new `LPString` that stores the extracted substring. The function has two arguments: `index` is the starting location of the copy, and `length` is the substring's length. The example assumes that the substring function is called with `index` = 7 and `length` = 5. The function creates a local temporary variable, `local`, to hold the copy until the function returns it.

The relationships between `this` string and the function parameters `index` and `length`.

The string the `substring` function returns.

The `substring` function verifies that the starting location, `index`, is valid (i.e., inbounds). It also verifies that the sum of the substring starting location and length doesn't index the `this` string out of bounds. If either test fails, the function throws an exception using the `throw` keyword. The function creates an empty `LPString` named `local`, initializes its length to the substring's length, and copies the substring characters from `this` string to `local` one at a time. When the for-loop finishes copying the characters, the function returns `local`, containing the extracted characters.

The tables can help understand how the program uses the loop control variable to index into the string arrays. Unlike many for-loops in the previous problems, we begin this loop at 0 and use a strict less-than test to drive it (highlighted in yellow). We adjust the range of the loop control variable by adding 1 to it when we index into the local string's `text` array (highlighted in light blue). We use the sum of the loop control variable and `index`, the substring starting location, to index into `this` string (highlighted in coral). We *could* start the for-loop at 1, use `<=`, simplify the indexing into `local`, and compensate by changing the `this` string indexing: `text[index + i - 1]`.

String Comparison Functions

The picture shows two LPStrings, this and s, and suggests that the function must compare pairs of characters. — **The `LPString equals` function**. The `equals` function compares the characters of two `LPStrings`, left to right, one pair of characters at a time - including the "characters" storing the strings' lengths. The function returns `false` when it detects the first unequal pair; it returns `true` only after comparing all pairs and verifying that they are equal. The comparison is case-sensitive, meaning that `A` is not equal to `a`.

The picture suggests that the `equals` function compares the elements of two `LPStrings` by pairs, including the elements storing the strings' lengths. The function returns `true` after comparing the characters in locations 0 through 11 without detecting a mismatch.

Characters at index locations 0 through 6 are equal, but the characters at index location 7 are not, causing the function to return `false` without comparing additional characters.

The function determines, with a single comparison, that the strings have different lengths and returns immediately.

The `equals` function is small and straightforward. Beginning the loop at 0 includes the strings' lengths, so strings of unequal lengths are rejected quickly. This logic allows us to drive the loop with one string's length without the risk of (logically) indexing the other string out of bounds.

A set of tests validating the `equals` function and demonstrating how to call it.

The picture shows two LPStrings, this and s. The function compares the characters in pairs and ends when it finds the first unequal pair. — **The `LPString equals` function**. The `equals` function compares the characters of two `LPStrings`, left to right, one pair of characters at a time - including the "characters" storing the strings' lengths. The function returns `false` when it detects the first unequal pair; it returns `true` only after comparing all pairs and verifying that they are equal. The comparison is case-sensitive, meaning that `A` is not equal to `a`.

The picture suggests that the `equals` function compares the elements of two `LPStrings` by pairs, including the elements storing the strings' lengths. The function returns `true` after comparing the characters in locations 0 through 11 without detecting a mismatch.

Characters at index locations 0 through 6 are equal, but the characters at index location 7 are not, causing the function to return `false` without comparing additional characters.

The function determines, with a single comparison, that the strings have different lengths and returns immediately.

The `equals` function is small and straightforward. Beginning the loop at 0 includes the strings' lengths, so strings of unequal lengths are rejected quickly. This logic allows us to drive the loop with one string's length without the risk of (logically) indexing the other string out of bounds.

A set of tests validating the `equals` function and demonstrating how to call it.

The picture shows two LPStrings, both containing the characters 'apple.' The order function compares the strings one character at a time. All the corresponding characters match, and the function returns 0. — **The `LPString order` function**. Ordering functions compare two strings and determine their relative order, that is, which one comes first. Determining two strings' relative order is an important step in, among other operations, sorting strings - for example, listing them in alphabetical order. Given two strings, `X` and `Y`, and the operation `order(X,Y)` or `X.order(Y)`, ordering functions typically return a negative value if `X` comes before `Y`, a positive value if `X` comes after `Y`, and 0 if `X` and `Y` have the same order. The magnitude of the positive and negative values is unimportant, and modern functions typically return -1, 0, and 1. Like `equals`, the `order` comparisons are case-sensitive. Furthermore, upper-case letters come before lower-case.

The strings are the same length, and their characters are all the same, so the strings have the same order, indicated when the function returns 0.

The strings are the same length, but their first characters differ. The nested if-statement nested ends the for-loop early. As pictured, the function returns -1, but the validation code tests both orders..

The loop runs four times before the mismatched string lengths end it. The if-else ladder determines the order by applying the rule "nothing comes before something." As pictured, the function returns 1, but the validation code, (f), tests both orders.

The strings are the same length but differ at the last character. The for-loop ends the function call.

The for-loop stops when it reaches the end of the shortest string. The nested if-statement determines the strings' order if the function finds mismatched characters before reaching the end of the shortest string; otherwise, the if-else ladder makes the determination.
If execution reaches the ladder, the loop didn't find mismatched characters, and the strings' lengths determine the order based on the "nothing comes before something" rule.

A *minimal* set of validating tests. This function is "tricky," and we must test it thoroughly.

The picture has two strings. As pictured, the first string contains 'apple' and the second 'zebra.' The function must only compare the first character from each string to detect the mismatch. 'a' comes before 'z,' so the function returns -1. — **The `LPString order` function**. Ordering functions compare two strings and determine their relative order, that is, which one comes first. Determining two strings' relative order is an important step in, among other operations, sorting strings - for example, listing them in alphabetical order. Given two strings, `X` and `Y`, and the operation `order(X,Y)` or `X.order(Y)`, ordering functions typically return a negative value if `X` comes before `Y`, a positive value if `X` comes after `Y`, and 0 if `X` and `Y` have the same order. The magnitude of the positive and negative values is unimportant, and modern functions typically return -1, 0, and 1. Like `equals`, the `order` comparisons are case-sensitive. Furthermore, upper-case letters come before lower-case.

The strings are the same length, and their characters are all the same, so the strings have the same order, indicated when the function returns 0.

The strings are the same length, but their first characters differ. The nested if-statement nested ends the for-loop early. As pictured, the function returns -1, but the validation code tests both orders..

The loop runs four times before the mismatched string lengths end it. The if-else ladder determines the order by applying the rule "nothing comes before something." As pictured, the function returns 1, but the validation code, (f), tests both orders.

The strings are the same length but differ at the last character. The for-loop ends the function call.

The for-loop stops when it reaches the end of the shortest string. The nested if-statement determines the strings' order if the function finds mismatched characters before reaching the end of the shortest string; otherwise, the if-else ladder makes the determination.
If execution reaches the ladder, the loop didn't find mismatched characters, and the strings' lengths determine the order based on the "nothing comes before something" rule.

A *minimal* set of validating tests. This function is "tricky," and we must test it thoroughly.

Try It Yourself

Learning to draw and use pictures to help solve problems takes practice. Two LPString functions, concat and insert, remain unimplemented. Writing these functions will give us some practice using pictures, more experience solving basic programming problems, and help us review array and member function syntax. Once you have implemented the functions, design and write an appropriate set of validation tests.

The picture illustrates three LPStrings represented as rectangles: 'this,' 's,' and 'local.' The function must copy this[1] to local.text[1] through this[5] to local.text[5]. Then, it copies s.text[1] to local.text[6] through s.text[6] to local.text[11]. — **The `LPString concat` function**. We can write this function in two fundamentally different ways. First, we can write it using existing functions. This approach is relatively easy. Second, we can write it using fundamental operations like loops, if-statements, and arrays - just as we have done in the previous examples. See if you can write the function both ways, as each approach can teach us a valuable lesson.
LPString concat(const LPString& s) const;
The `concat` function concatenates two strings, `this` and `s`, to form a new string, named `local` in the illustration, that it returns. The picture suggests the function has five main parts:

Validate that the concatenated strings, `this` and `s`, fit (i.e., do not overflow) an `LPString`'s capacity.

Create a new `LPString` in local or function scope.

Copy the characters from `this` to `local`.

Append the characters from `s` to the end of `local`.

Set `local`'s length.

There are several ways of writing this function - see the `copy` and `append` functions for ideas. Two possible solutions are presented here.

Before and after pictures showing the parameter string's insertion into the target or this string. The target contains 'Hello world!' while the parameter contains 'new ' (note the space at the end, making the parameter four characters long). The insertion occurs at this[7], the location of the 'W' in the target. The picture illustrates shifting the characters in the target to the right four spaces. — **The `LPString insert` function**. The initial picture gives us an overview of how the `insert` function operates. But, this is one of the most complex functions in the class, so it is helpful to draw some intermediate pictures showing more detail before programming the function.
void LPString::insert(const LPString& s, int index);
The `LPString` `insert` function requires two parameters: a string that the function will insert into the target or `this` string and the location in the target where the insertion will take place. Some parts of the `insert` function are similar parts of the `copy` and `append` functions (Figures 6 and 13), and you may wish to review them before continuing. It's often easier to write complex functions like `insert` by breaking them down into logical steps:

Verify that the total length of the two strings will not overflow the target; throw an exception if the total is too long.

Verify that the `index` parameter is valid (i.e., it's inside the target string).

The picture suggests we must make room in the target string before inserting additional characters. We make the space by shifting some target characters to the right, beginning at the location indicated by `index`. We shift the target characters to the right by the length of the parameter string. *It's vital to recognize that the shift operation must begin with the rightmost character and proceed right-to-left*.

The next step, copying the characters from the parameter string to the target, is similar to the `append` function.

Finally, update the length of the target or `this` string; this step is also like the `append` function.

When finished, please study the solution presented here.

Downloadable Code

The example programs are formatted with tab stops set at 8 spaces.

View¹	Download
LPString.h	LPString.h
LPString.cpp	LPString.cpp
client.cpp	client.cpp

¹ The behavior of these links depends on your browser and desktop configuration.

Jonassen, D. H. (2000). Toward a design theory of problem solving. Educational Technology, Research and Development, 48(4), 63-85.

Greeno, J. G., Collins, A. M., & Resnick, L. B. (1996). Cognition and learning. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of Educational Psychology (pp. 15-46). New York: MacMillian Library Reference USA.

Larson, L. C. (1983). Problem-Solving Through Problems. New York: Springer-Verlag.

	LPString::LPString(const char* s) { text[0] = 0; for (int i = 0; s[i] && i < LENGTH - 1; i++) { text[0]++; text[i + 1] = s[i]; } }	0 1 H 2 He 3 Hel 4 Hell 5 Hello
(a)	(b)	(c)
LPString::print("\n*** Testing LPString(char), length, and println function: **\n"); LPString lps2("See the quick red fox jump over the lazy brown dog. " "See the quick red fox jump over the lazy brown dog. " "See the quick red fox jump over the lazy brown dog."); cout << lps2.length() << endl; LPString::print("lp2 = "); lps2.println();
(d)

	LPString::LPString(char c) { text[0] = 1; text[1] = c; }	LPString lps3('X'); cout << lps3.length() << endl; LPString::print("lps3 = "); lps3.println();
(a)	(b)	(c)

	LPString::LPString(const LPString& s) { for (int i = 0; i <= s.text[0]; i++) text[i] = s.text[i]; }	LPString lps4(lps2); LPString::print("lps4 = "); lps4.println();
(a)	(b)	(c)

	unsigned char& LPString::at(int index) { if (index < 1 \|\| index > text[0]) throw "index out of bounds"; return text[index]; }
(a)	(b)
LPString::print("\n* Testing the at function: \n"); LPString lps1("Hello world"); try { // 2 ways of printing a character - "at" as an r-value or getter char c = lps1.at(7); cout << c << endl; LPString(lps1.at(7)).println(); // obscure conversion constructor call lps1.at(7) = 'X'; // "at" as an l-value or setter LPString::print("lp1 after changing the first character: \n"); lps1.println(); lps1.at(12); // out of bounds - throws an exception } catch (const char error) { cerr << "Error: " << error << endl; }
(c)

$A rectangle, divided into boxes, representing a C-string. The boxes, left to right, save the characters 'Testing\0' where 'T' is in line[0] and '\0' is in line[7].$	static void print(char* line);
(a)	(b)
void LPString::print(const char* line) { cout << line; }	LPString::print("\Testing the copy constructor:\n");
(c)	(d)

	LPString() { text[0] = 0; }	LPString lps1; cout << lps1.length() << endl; LPString::print("lps1 = "); lps1.println();
(a)	(b)	(c)

	void LPString::print() const { for (int i = 1; i <= text[0]; i++) cout << text[i]; }
(a)	(b)
void LPString::println() const { print(); cout << '\n'; }	LPString lps1("Hello world"); lps1.print(); lps1.println();
(c)	(d)

	0 1 H 2 He 3 Hel 4 Hell 5 Hello 6 Hello 7 Hello w 8 Hello wo 9 Hello wor 10 Hello worl 11 Hello world
(a)	(b)
void LPString::readln() { int c; text[0] = 0; while ((c = cin.get()) != '\n' && text[0] < LENGTH - 1) text[++text[0]] = c; }	LPString::print("\n* Testing the order function: *\n"); LPString lps1; cout << "Please enter a string: "; lps1.readln(); lps1.println();
(c)	(d)

	5 Hello 6 Hello 7 Hello w 8 Hello wo 9 Hello wor 10 Hello worl 11 Hello world
(a)	(b)
void LPString::append(const LPString& s) { if (text[0] + s.text[0] >= LENGTH) throw "strings too long to append"; for (int i = 1; i <= s.text[0]; i++) text[i + text[0]] = s.text[i]; text[0] += s.text[0]; }	LPString::print("\n* Testing the append function: *\n"); LPString lps1("Hello"); LPString lps2(" world"); lps1.append(lps2); lps1.println(); LPString lps3("Hell"); lps3.append('o'); // append a single character lps3.append(" world"); // append a C-string lps3.println();
(c)	(d)

	LPString LPString::copy() const { LPString local; for (int i = 0; i <= s.text[0]; i++) local.text[i] = text[i]; return local; }	LPString lps1("Hello"); LPString lps2 = lps1.copy(); lps2.println();
(a)	(b)	(c)


(a)	(b)
	bool LPString::equals(const LPString& s) { for (int i = 0; i <= text[0]; i++) if (text[i] != s.text[i]) return false; return true; }
(c)	(d)
LPString::print("\n* Testing the equals function: *\n"); LPString lps1("hello world"); LPString lps2("hello world"); if (lps1.equals(lps2)) cout << "equals" << endl; else cout << "not equals" << endl; LPString lps3("hello world"); LPString lps4("hello Alice"); if (lps3.equals(lps4)) cout << "equals" << endl; else cout << "not equals" << endl;	LPString lps5("hello world"); LPString lps6; if (lps5.equals(lps6)) cout << "equals" << endl; else cout << "not equals" << endl; LPString lps7("apple"); LPString lps8("zebra"); if (lps7.equals(lps8)) cout << "equals" << endl; else cout << "not equals" << endl;
(e)


(a)	(b)
	int LPString::order(const LPString& s) { for (int i = 1; i <= text[0] && i <= s.text[0]; i++) if (text[i] < s.text[i]) return -1; else if (text[i] > s.text[i]) return 1; if (text[0] == s.text[0]) return 0; else if (text[0] < s.text[0]) return -1; else return 1; }
(c)

(d)	(e)
LPString::print("\n* Testing the order function: *\n"); LPString lps1 = "apple"; LPString lps2 = "apple"; cout << lps1.order(lps2) << endl; // 0 (a) LPString lps3 = "apple"; LPString lps4 = "zebra"; cout << lps3.order(lps4) << endl; // -1 (b) cout << lps4.order(lps3) << endl; // 1	LPString lps5 = "apple"; LPString lps6 = "appl"; cout << lps5.order(lps6) << endl; // 1 (c) cout << lps6.order(lps5) << endl; // -1 LPString lps7 = "apple"; LPString lps8 = "applx"; cout << lps7.order(lps8) << endl; // -1 (d) cout << lps8.order(lps7) << endl; // 1
(f)

The Length-Prefixed String Class Specification