14.7.1. HTMLfix.cpp: Character I/O Example

Review

The first part of the example uses functions and operators from the string class. Please review the following as needed:

The textbook, which is written in HTML, includes many C++ programming examples. However, source code written in C++ and other languages often contain three characters that are not compatible with HTML: <, >, and &. For these characters to display correctly in a web browser, I need to replace them with the appropriate HTML encodings. Manually replacing each character is a tedious and error-prone task. I need a simple program that can find and replace all of the characters at once and save the modified code in a new file. And, while we're at it, it would be nice to have the program add some simple boiler-plate formatting as well. The following C++ program, HTMLfix, performs these tasks for C++, Java, C#, and similar programming languages.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
    string input;                              // (a)
    cout << "Input file: ";
    getline(cin, input);

    ifstream in(input);                        // (b)

    if (!in.good())                            // (c)
    {
        cerr << "Unable to open " << input << endl;
        exit(1);
    }

    size_t ext = input.rfind('.');             // (d)
    string file = input.substr(0, ext);
    //cout << file << endl;

    string output = file + ".html";            // (e)
    ofstream out(output);                      // (f)

    if (!out.good())                           // (g)
    {
        cerr << "Unable to open " << output << endl;
        exit(1);
    }

    out << "<h1>" << input << "</h1>" <<       // (h)
        endl << endl << "<pre>";

    while (!in.eof())                          // (i)
    {
        char c;                                // (j)

        in.get(c);                             // (k)

        switch(c)                              // (l)
        {
            case '<':
                out << "&lt;";
                break;
            case '>':
                out << "&gt;";
                break;
            case '&':
                out << "&amp;";
                break;
            default:                           // (m)
                out.put(c);
                //out << c;
                break;
        }
    }

    out << "</pre>" << endl;                   // (n)

    return 0;
}
  1. Get the input file name (including the extension). Note that file name may be either absolute or relative).
  2. The ifstream constructor creates a stream object and attempts to open the file in text mode for reading.
  3. If the file isn't good (i.e., it didn't open correctly), the program prints an error message and terminates.
  4. rfind locates the '.' character in input. substr creates the name of the input file w/o the extension. cout verifies that file is correct.
  5. operator+ concatenates the name in file with the extension ".html" to form the output file name.
  6. The ofstream constructor creates a stream object and attempts to open the file in text mode for writing.
  7. Test the stream object to see if the file opened; if the file isn't good, the program prints an error message and terminates
  8. Inserts some simple HTML code into the output file. The << operator can write to a file just as it does to the console.
  9. Loops while not at the end of file - i.e., while there is still data in the file to process.
  10. Defines a variable to hold characters as they are read.
  11. The get function reads one character from the file; each read operation automatically advances the position pointer by one byte; the returned character is stored in the variable c. (How is c passed to get, by value, pointer, or reference?)*
  12. If the input character, c, is one of the special characters, it is replaced with the appropriate HTML encoding.
  13. Characters in the source code file that do not conflict with HTML are copied to the output file unmodified. Either the put function or the << operator can be used.
  14. Inserts some simple HTML code into the output file.
HTMLfix.cpp. A benefit of processing a file one character at a time is that doing so allows us to process individual characters. HTMLfix converts C++ source code into HTML by reading and processing the source code file one character at a time. It converts all '<,' '>,' or '&' characters into the corresponding HTML encoding for that character but does not modify other characters. Finally, HTMLfix writes processed source code to an HTML formatted file.

* The function call stores data in the parameter c - that is, it returns data through the parameter, which requires and INOUT passing technique, eliminating pass by value. The variable c is not an array and the function call does not include the address of operator, &, which eliminates pass by pointer. So, the get function must use pass by reference.

Downloadable File

HTMLfix.cpp