Non-object-oriented programming languages must represent strings without relying on classes. Structures or their equivalent are a logical replacement, and we'll explore such an implementation in the next section. But we can encode a string as a single-character array if we're clever. We have seen that the C-programming language accomplished this by adding a sentinel or null-termination character at the end of the textual data. This section explores another single-array implementation called length-prefixed or size-prefixed strings. An early implementation of the Pascal programming language from the University of California San Diego, UCSD Pascal, based its string type on this scheme. So many programmers also call length-prefixed strings UCSD strings.
Programmers implement length-prefixed strings as fixed-length arrays 256 characters long and store the string's length in the first or 0-th character. For simplicity, we use an 8-bit or 1-byte unsigned character for our character array. An 8-bit character is an integer that can store 256 distinct values. The character type is frequently signed on modern computer hardware, giving it a range of [-128 - 127]. Alternatively, an unsigned character has a range of [0 ‑ 255] and is how we'll implement the class. As we reserve the first array character for the string's length, 255 characters remain to store the string's textual data. This approach has some disadvantages: it wastes space when strings are short and doesn't allow strings longer than 255 characters, but it is simple enough that programming languages can support it as a fundamental or built-in data type.
An Empty String
String With Content
(a)
(b)
Length-prefixed string. The length-prefixed implementation of the general string API captures the three string elements:
The textual data is stored in the elements of an array, named text, allocated automatically on the stack
The string's length is stored in text[0], coming before or prefixing the textual data - hence the name length-prefixed string
The string's capacity is implicit in the implementation as a fixed-length array
An empty string has a length of 0, which the string saves in the first array element: text[0]. The remaining array elements, text[1] through text[255] are logically empty. Physically, every memory location always has some content - a random value leftover from the computer startup or the last program - but the string functions ignore the values in the elements beyond the string's length. Creating the string establishes its capacity of 255, and it never changes. Notice that the string's capacity is one less than the array's capacity of 256.
In this example, element 0 is 5, the string's length. The string's content, saved in elements 1 through 5, is "Hello." Elements 6 through 255 are empty.
These are not C-strings, so they are not zero-index or null-terminated. Furthermore, they are not dynamic - their length can vary, but their capacity cannot.
Length-prefixed strings are simple enough to implement without making them a class. Nevertheless, they are a good class example, demonstrating how pictures can help us see and program problem details. The following figures detail the class and each function. The complete source code for the example is available for download at the bottom of the page.
The Length-Prefixed String Class Specification
I recommend a stepwise or cyclic approach to class implementation. After programming each function, or at most each small group of related functions, pause to test and verify the additions. Verifying the code this way makes finding and correcting syntax errors easier, making the task less frustrating. It will also help make the overall debugging and validation process more manageable. And finally, some member functions often depend on other members; if the independent functions are validated, any errors are more likely in the dependent functions. Testing and validation typically require a certain "critical mass" of functions. Specifically, we need a constructor and a display or print function, so we begin with these.
The LPString default constructor. The default constructor creates a logically "empty" LPString object.
The constructor creates the array automatically on the stack with a predetermined capacity. The picture clearly shows the constructor's only task: initializing the array's length to 0. The random values in elements 1 through 255 are irrelevant.
An initializer list can initialize a member variable but not part of a variable (i.e., not one array element). So, the constructor cannot use a list, and we write the function with a "regular" body. The complete function consists of one statement, making it an ideal candidate for implementation as an inline function in the class specification.
Following the cyclic approach described above, the initial test creates an object with the most basic constructor, the default. The also uses the length, println, and print(char*) functions, which are detailed below.
LPString::LPString(const char* s)
{
text[0] = 0;
for (int i = 0; s[i] && i < LENGTH - 1; i++)
{
text[0]++;
text[i + 1] = s[i];
}
}
0
1 H
2 He
3 Hel
4 Hell
5 Hello
(a)
(b)
(c)
LPString::print("\n*** Testing LPString(char*), length, and println function: ***\n");
LPString lps2("See the quick red fox jump over the lazy brown dog. "
"See the quick red fox jump over the lazy brown dog. "
"See the quick red fox jump over the lazy brown dog.");
cout << lps2.length() << endl;
LPString::print("lp2 = ");
lps2.println();
(d)
The LPString(char*) conversion constructor. The constructor converts C-string, s, to an LPString object by copying the characters one at a time. Correctly indexing the arrays and controlling the for-loop are challenging sub-problems, and this example demonstrates how a picture can help us solve them.
The LPString is initially empty as illustrated in Figure 1(a). The function copies the characters from s to the LPString'stext array with a for-loop. But where does the loop begin and end, and how do we index into the arrays?
(Equivalently, what values does the loop-control variable take, and how do we use the variable to index the arrays?)
Try mapping parts of the picture to corresponding parts of the C++ code:
The function uses the LPString's length, text[0], as an accumulator to count the characters as it copies them. The function must initialize the length to 0 before looping and must increment the count during each iteration.
C-strings are zero-indexed and the copy operation begins at s[0], so we initialize the loop control variable to 0
. However, text[0] is the string's length, and the characters begin at text[1]. This organization makes the indexes off by one throughout the copy operation. The
assignment operation accounts for the offset by adding 1 to the loop control variable when indexing text.
Two situations can end the loop. The null termination character is a character 0, which C++ treats as false. If s is short, < 255 characters, the sub-expression s[i] ends the loop (when loop reaches s[5] in this example). If s is long, >= 255 characters, the sub-expression i < LENGTH - 1 ends the loop. The -1 is necessary to prevent indexing text out of bounds.
Pictures don't need to be elaborate to be helpful - simple characters are often sufficient. This picture shows how text begins and changes with each loop iteration.
It's necessary to test strings with a length greater than 127. The test-and-validation code uses a "trick" inherited from C to create a long string: the compiler automatically concatenates adjacent C-strings to form a single string. The test code prints the newly created LPString's length and content, verifying that the class works with long strings.
LPString::LPString(char c)
{
text[0] = 1;
text[1] = c;
}
The LPString copy constructor. The copy constructor creates a new LPString object by copying an existing one. The picture of the problem and the function code are similar to the char* conversion constructor (Figure 4).
The picture helps us identify details leading to a compact and efficient solution. The function must copy length+1 characters from the original or parameter LPString to the new one. The loop must begin at 0 and iterate the original string's length plus one.
A single for-loop copies the used elements of the existing LPString (the length in element 0 and the characters in elements 1 through 5) to the new string. The contents of the unused elements are irrelevant, so the function does not copy them. So, the for-loop begins at 0 and uses <= for control.
The test and validation code uses lps2 created in Figure 4.
LPString Access Functions
We could choose to name the LPString access functions with the "get" and "set" prefixes like other access functions. However, looking at the C++ and Java string libraries or APIs, these functions don't typically follow that naming convention. So, we choose instead to follow the conventions of the other languages.
int length() const { return text[0]; }
(a)
(b)
The LPString length function.
The picture illustrates the relationship between the saved textual data, "Hello world," and the string's length, 11. It also emphasizes that the string's length - the number of characters currently stored in the string - is always saved in the first array element: text[0].
The length function is a "getter," but most string classes name it either length or size, and many provide both functions. The function is short, so we inline it in the class specification. We validated it above in conjunction with the constructors.
unsigned char& LPString::at(int index)
{
if (index < 1 || index > text[0])
throw "index out of bounds";
return text[index];
}
(a)
(b)
LPString::print("\n*** Testing the at function: ***\n");
LPString lps1("Hello world");
try
{
// 2 ways of printing a character - "at" as an r-value or getter
char c = lps1.at(7);
cout << c << endl;
LPString(lps1.at(7)).println(); // obscure conversion constructor call
lps1.at(7) = 'X'; // "at" as an l-value or setter
LPString::print("lp1 after changing the first character: \n");
lps1.println();
lps1.at(12); // out of bounds - throws an exception
}
catch (const char* error)
{
cerr << "Error: " << error << endl;
}
(c)
The LPString at function. Surprisingly, the at function implements both "getter" and "setter" operations - it can get or set the character at the index location. Returning a reference (the red ampersand) allows programs to use the function as an l- and an r-value, performing both operations. (The compiler treats it as a value or address, depending on where the program uses it.)
A picture helps us see the relationship between the text saved in the string and each character's index location. We need to clarify the character indexing because we sometimes use a zero-indexed organization, and sometimes we don't. The arrow points to the 'w' at index location 7, which we use in the test and validation code.
The at function returns a reference to one element, a variable, in text. The if-statement verifies that the index is valid or in-bounds (i.e., within the string) and throws an exception if it is not.
The test and validation code for the at function demonstrates some vital syntax and one obscure conversion.
The function call lps1.at(7) gets one character element or variable from lps1. As used in the two illustrated statements, the compiler treats the element as an r-value or the character stored in the variable. The second example, with the obscure conversion, might be confusing. None of the overloaded print functions can print a single character, but we get around the limitation by calling a conversion constructor. lps1.at(7) returns a character, which is passed to the LPString(char) constructor. The constructor call creates a new, anonymous object, and the object calls println, which can print an LPString.
In the statement lps1.at(7) = 'X' the at functional call again gets the element or variable from lps1 at index location 7. But in this statement, the call is on the left side of the assignment operator, so the compiler treats it as an address and saves the character 'X' in that memory location.
The statement lps1.at(12) indexes the string out of bounds - that is, one position beyond the last character - and causes the function to throw an exception.
Together, the try and catch blocks detect and handle the index-out-of-bounds exception.
LPString::print("\Testing the copy constructor:\n");
(c)
(d)
The LPString static print function. The static version of the print function is a special case: it allows us to print C-strings with the LPString class. We could continue using the <iostream> functions to complete this task, but including it in LPString provides us with another opportunity to demonstrate static or class functions.
The picture reminds us that line is a C-string, so it is zero-indexed and null-terminated.
To demonstrate the placement of the "static" keyword, we prototype the print function in the class specification, which is in the LPString.h header file.
Continuing the demonstration, we place the function definition in the LPString.cpp source code file. Notice that we don't need the "static" keyword here.
The static print function "belongs" to the class rather than to an object or instance of the class. So, when a program calls the function, it must use the class name and the scope resolution operator, ::.
void LPString::print() const
{
for (int i = 1; i <= text[0]; i++)
cout << text[i];
}
The LPString print and println member functions. The print and println functions are named the same as the corresponding Pascal and Java functions. The print function prints a string to the console without a trailing new-line character, while println prints the string followed by a new-line character. To prevent duplicating code, println calls print and then adds the new-line character.
Admittedly, the picture would be more useful if we followed a more authentic implementation based on lower-level operations or system calls. Still, the function uses a for-loop to print the characters one at a time, and the picture helps us configure the loop.
The text array is not null-terminated, so we can't print it with a single C-string operation. Using the information organized in the picture, we configure the for-loop controls: the loop starts at 1, uses less than or equals for the test, and compares the loop-control variable with the value saved in text[0].
It is generally good practice to avoid duplicating code whenever feasible, so the println calls print and then adds the new-line character.
Function validation is straightforward.
0
1 H
2 He
3 Hel
4 Hell
5 Hello
6 Hello
7 Hello w
8 Hello wo
9 Hello wor
10 Hello worl
11 Hello world
LPString::print("\n*** Testing the order function: ***\n");
LPString lps1;
cout << "Please enter a string: ";
lps1.readln();
lps1.println();
(c)
(d)
The LPString readln function. String-input functions typically allow users to backspace and reenter characters before signaling the program to read the string by pressing the Enter key. Pressing the enter key also inserts a new-line character at the input stream. The Java and Pascal readln functions read the new-line character but discard it (i.e., they do not include it in the string). We'll read the string one character at a time, allowing us to locate the new-line character. We'll use the get function (see the wc.cpp example) in place of lower-level operations, to read the characters.
String input functions typically discard or overwrite a string's contents. Accordingly, the LPString must be empty before the reading operation begins. If the string is new, as in Figure 1(a), it's ready for the operation. However, if the string contains text, as in Figure 1(b), the function must discard the character data before reading. The top string (1) shows what Figure 1(b) looks like after the function empties it - the length is 0, and the function ignores the remaining characters: "Hello." The second string (2) illustrates characters as the get function reads them from cin and saves them in c. The while-loop copies the characters to text.
A simple picture illustrates how the string changes during each loop iteration. The brown box represents the space character.
Although the readln function is short, it involves several intricate steps:
The statement text[0] = 0 empties the string - array (1). Now, the function can use array[0] as an accumulator to count the characters as the loop adds them to the string.
The get function reads characters one at a time from cin and temporarily saves them in c (the pair of red parentheses force the get function call and the assignment operation to take place first).
The loop runs while the input is not the new-line character, and there is space in text for additional characters. The "-1" is necessary because although the array has 256 elements, the string only uses 255 to store characters.
The expression ++text[0] first increments the string's length and then uses it as an index into the string.
Algorithmic Functions
Algorithmic functions manipulate, modify, and otherwise useLPStrings to solve client program problems. For organizational convenience, we'll group these functions into three sub-categories:
Functions that modify this object. The functions in this category follow the general pattern: void a.function(b), where a is an LPString object and b represents 0 or more parameters of various types. The functions change a, reflecting the function's results.
Functions that create a new LPString object. These functions have the general pattern: LPString a.function(), where a is an LPString object and b is 0 or more parameters. The functions in this group do not alter a or b but return a new LPString object representing the function's operation.
Functions that compare two LPStrings. The final group has follows two similar patterns: bool a.function(b) or int a.function(b), where a and b are LPStrings.
Functions Modifying this Object
(a)
(b)
void clear() { text[0] = 0; }
lps2.clear();
(c)
(d)
The LPString clear function. The clear function is trivial, and the text explains the concepts justifying its operation above.
An LPString before the clear function operation.
The string after the clear function operation.
The clear function inlined in the class specification.
A simple test statement. See the append function below for the full context of the test.
5 Hello
6 Hello
7 Hello w
8 Hello wo
9 Hello wor
10 Hello worl
11 Hello world
(a)
(b)
void LPString::append(const LPString& s)
{
if (text[0] + s.text[0] >= LENGTH)
throw "strings too long to append";
for (int i = 1; i <= s.text[0]; i++)
text[i + text[0]] = s.text[i];
text[0] += s.text[0];
}
LPString::print("\n*** Testing the append function: ***\n");
LPString lps1("Hello");
LPString lps2(" world");
lps1.append(lps2);
lps1.println();
LPString lps3("Hell");
lps3.append('o'); // append a single character
lps3.append(" world"); // append a C-string
lps3.println();
(c)
(d)
The LPString append function. The append function adds or appends characters at the end of thisLPString. The function uses a for-loop to copy each character, and correctly indexing each string with the loop-control variable is the most challenging part of the function. An ancillary problem is distinguishing the strings' lengths, which is necessary to control the for-loop and index the strings. The picture helps us see how the function must index the strings and drive the loop.
Appends the parameter s to the end of thisLPString by copying the parameter characters one at a time. We can use the same variable to index both strings if we use a constant offset when indexing this string. The offset is the length of this string. The loop copies the characters from s to this string. The for-loop runs from 1, the index location of the first character in s, to the length of s, saved in s.text[0].
A dynamic, step-by-step picture of the copy operation. The final picture details the this string after the function finishes. The brown boxes represent the space character.
The function begins by verifying that there is enough space in this string to complete the append operation and throws an exception if there isn't. The for-loop carries out the copy operation outlined in the picture. When the loop finishes, the function updates the length of this string. Notice that the function does not increment the length of this string because doing so would "break" the constant offset needed for offsetting the this string index.
We divide the test and validation code into two groups. The first group is straightforward: it appends the function argument to this string. However, the second group relies on an unexpected C++ operation. The at function test-and-validation code (Figure 8) employed an obscure - in the sense that it's hard to see - conversion operation. This example goes a step further and uses two "hidden" conversions. While the LPString class does not have overloaded append functions that accept a character or a C-string, it does have constructors that do. So, the C++ compiler automatically converts 'o' and " world" into anonymousLPString objects and then uses them to complete the append operations. The compiler will only perform one level of conversion: it won't automatically convert x to y and then convert y to z.
Functions Creating A New LPString Object
LPString LPString::copy() const
{
LPString local;
for (int i = 0; i <= s.text[0]; i++)
local.text[i] = text[i];
return local;
}
The LPString copy function. The copy function is very similar to the copy constructor, and you could argue that the copy constructor makes the copy function redundant. Nevertheless, the class includes it as a simple example of a function that returns an object.
The picture suggests that the function must copy the elements of this object to another object. Unlike the previous functions, the function's signature or prototype doesn't provide another object. So, the function creates a temporary, local object and copies this object to it. Following the copy operation, the function returns the local object.
The function creates a local, and initially empty, object LPString object named local with the default constructor. A single for-loop copies the elements of this string to local string. The return operator returns local by value (i.e., by copy).
Calling the copy function and validating the returned value is straightforward.
LPString LPString::substring(int index, int length) const
{
if (index < 1 || index > text[0])
throw "index is out of bounds";
if (index + length >= LENGTH)
throw "\"length\" is too long";
LPString local;
local.text[0] = length;
for (int i = 0; i < length; i++)
local.text[i + 1] = text[index + i];
return local;
}
(a)
(b)
(c)
i
index+i
text[index+i]
i+1
local.text[i+i]
0
7
w
1
w
1
8
o
2
o
2
9
r
3
r
3
10
l
4
d
4
11
d
5
d
(d)
The LPString substring function. The substring function extracts and copies part of an LPString object, creating a new LPString that stores the extracted substring. The function has two arguments: index is the starting location of the copy, and length is the substring's length. The example assumes that the substring function is called with index = 7 and length = 5. The function creates a local temporary variable, local, to hold the copy until the function returns it.
The relationships between this string and the function parameters index and length.
The string the substring function returns.
The substring function verifies that the starting location, index, is valid (i.e., inbounds). It also verifies that the sum of the substring starting location and length doesn't index the this string out of bounds. If either test fails, the function throws an exception using the throw keyword. The function creates an empty LPString named local, initializes its length to the substring's length, and copies the substring characters from this string to local one at a time. When the for-loop finishes copying the characters, the function returns local, containing the extracted characters.
The tables can help understand how the program uses the loop control variable to index into the string arrays. Unlike many for-loops in the previous problems, we begin this loop at 0 and use a strict less-than test to drive it (highlighted in yellow). We adjust the range of the loop control variable by adding 1 to it when we index into the local string's text array (highlighted in light blue). We use the sum of the loop control variable and index, the substring starting location, to index into this string (highlighted in coral). We could start the for-loop at 1, use <=, simplify the indexing into local, and compensate by changing the this string indexing: text[index + i - 1].
String Comparison Functions
(a)
(b)
bool LPString::equals(const LPString& s)
{
for (int i = 0; i <= text[0]; i++)
if (text[i] != s.text[i])
return false;
return true;
}
The LPString equals function. The equals function compares the characters of two LPStrings, left to right, one pair of characters at a time - including the "characters" storing the strings' lengths. The function returns false when it detects the first unequal pair; it returns true only after comparing all pairs and verifying that they are equal. The comparison is case-sensitive, meaning that A is not equal to a.
The picture suggests that the equals function compares the elements of two LPStrings by pairs, including the elements storing the strings' lengths. The function returns true after comparing the characters in locations 0 through 11 without detecting a mismatch.
Characters at index locations 0 through 6 are equal, but the characters at index location 7 are not, causing the function to return false without comparing additional characters.
The function determines, with a single comparison, that the strings have different lengths and returns immediately.
The equals function is small and straightforward. Beginning the loop at 0 includes the strings' lengths, so strings of unequal lengths are rejected quickly. This logic allows us to drive the loop with one string's length without the risk of (logically) indexing the other string out of bounds.
A set of tests validating the equals function and demonstrating how to call it.
(a)
(b)
int LPString::order(const LPString& s)
{
for (int i = 1; i <= text[0] && i <= s.text[0]; i++)
if (text[i] < s.text[i])
return -1;
else if (text[i] > s.text[i])
return 1;
if (text[0] == s.text[0])
return 0;
else if (text[0] < s.text[0])
return -1;
else
return 1;
}
The LPString order function. Ordering functions compare two strings and determine their relative order, that is, which one comes first. Determining two strings' relative order is an important step in, among other operations, sorting strings - for example, listing them in alphabetical order. Given two strings, X and Y, and the operation order(X,Y) or X.order(Y), ordering functions typically return a negative value if X comes before Y, a positive value if X comes after Y, and 0 if X and Y have the same order. The magnitude of the positive and negative values is unimportant, and modern functions typically return -1, 0, and 1. Like equals, the order comparisons are case-sensitive. Furthermore, upper-case letters come before lower-case.
The strings are the same length, and their characters are all the same, so the strings have the same order, indicated when the function returns 0.
The strings are the same length, but their first characters differ. The nested if-statement nested ends the for-loop early. As pictured, the function returns -1, but the validation code tests both orders..
The loop runs four times before the mismatched string lengths end it. The if-else ladder determines the order by applying the rule "nothing comes before something." As pictured, the function returns 1, but the validation code, (f), tests both orders.
The strings are the same length but differ at the last character. The for-loop ends the function call.
The for-loop stops when it reaches the end of the shortest string. The nested if-statement determines the strings' order if the function finds mismatched characters before reaching the end of the shortest string; otherwise, the if-else ladder makes the determination.
If execution reaches the ladder, the loop didn't find mismatched characters, and the strings' lengths determine the order based on the "nothing comes before something" rule.
A minimal set of validating tests. This function is "tricky," and we must test it thoroughly.
Try It Yourself
Learning to draw and use pictures to help solve problems takes practice. Two LPString functions, concat and insert, remain unimplemented. Writing these functions will give us some practice using pictures, more experience solving basic programming problems, and help us review array and member function syntax. Once you have implemented the functions, design and write an appropriate set of validation tests.
Downloadable Code
The example programs are formatted with tab stops set at 8 spaces.
1 The behavior of these links depends on your browser and desktop configuration.
Jonassen, D. H. (2000). Toward a design theory of problem solving. Educational Technology, Research and Development, 48(4), 63-85.
Greeno, J. G., Collins, A. M., & Resnick, L. B. (1996). Cognition and learning. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of Educational Psychology (pp. 15-46). New York: MacMillian Library Reference USA.
Larson, L. C. (1983). Problem-Solving Through Problems. New York: Springer-Verlag.