This section describes and demonstrates five C-string functions:
|
const char* target = "HELLO WORLD"; const char* s1 = strchr(target, 'L'); const char* s2 = strrchr(target, 'L'); if (s1 != nullptr) cout << s1 << endl; if (s2 != nullptr) cout << s2 << endl;Output: LLO WORLD LD |
|
const char* target = "HELLO WORLD"; const char* s3 = strstr(target, "WORLD"); if (s3 != nullptr) cout << s3 << endl;Output: WORLD |
Parsing (also known as tokenizing) a string is a common data processing task. Parsing assumes that some groups of characters within a string, called tokens, have meaning in some context and that other characters, called delimiters, separate the groups. Programmers parse or tokenize a string by scanning it, looking for the delimiters, and separating the individual tokens. Programmers can use the C-string library function strtok
(for string tokenizer) to help them complete simple parsing tasks.
|
|
(a) | (b) |
strtok
scans target from left-to-right. Whenever strtok finds a delimiter, it replaces it with a null-termination character and returns the extracted token. The function returns nullptr after extracting all the tokens.strtok
in two different ways. They begin parsing target with call i and continue parsing (extracting the remaining tokens) call ii in a loop. We can see the basic logic with pseudocode fragments:
token = strtok(target, delims); while (token != nullptr) { // process token token = strtok(nullptr, delims); } |
token = strtok(target, delims); do { // process token token = strtok(nullptr, delims); } while (token != nullptr); |
for (token = strtok(target, delims; token != nullpter; token = strtok(nullptr, delims))) // process token |
The standard strtok
function relies on a local static
variable (a variable defined in the strtok function) to save the current parsing location within the target. So, if the program begins parsing a second string before finishing the first, the function loses the parse position in the first string. That is, programmers can't begin parsing string-A, switch to string-B, then return to string-A. The Microsoft and Linux versions eliminate this restriction by replacing the local static
variable with the context argument.
#include <iostream> #include <cstring> using namespace std; int main() { const char* delims = " :"; char target1[100] = "See the quick red fox"; char target2[100] = "jump:over:the:lazy:brown:dog"; char* context1 = nullptr; char* context2 = nullptr; char* token1 = strtok_s(target1, delims, &context1); char* token2 = strtok_s(target2, delims, &context2); while (token1 != nullptr || token2 != nullptr) { if (token1 != nullptr) cout << token1 << endl; token1 = strtok_s(nullptr, delims, &context1); if (token2 != nullptr) cout << token2 << endl; token2 = strtok_s(nullptr, delims, &context2); } return 0; } |
#include <iostream> #include <cstring> using namespace std; int main() { const char* delims = " :"; char target1[100] = "See the quick red fox"; char target2[100] = "jump:over:the:lazy:brown:dog"; char* context1 = nullptr; char* context2 = nullptr; char* token1 = strtok_r(target1, delims, &context1); char* token2 = strtok_r(target2, delims, &context2); while (token1 != nullptr || token2 != nullptr) { if (token1 != nullptr) cout << token1 << endl; token1 = strtok_r(nullptr, delims, &context1); if (token2 != nullptr) cout << token2 << endl; token2 = strtok_r(nullptr, delims, &context2); } return 0; } |
Microsoft | Linux |
---|
static
variable by saving the current parsing location. However, defining context in the application program's scope allows the application to parse multiple strings simultaneously without conflict.
Aside from the slightly different function names, both programs are identical. context is a C-string - a character-pointer: char*
. However, strtok_s and strtok_r modify it, so the program must pass it with an INOUT mechanism and use pass by pointer. So, the strtok_s and strtok_r prototypes presented in Figure 3 show the type as a pointer-to-a-pointer: char**
, and the program finds the variable's address with the address-of operator (shown in red).