This code is not so small so I will not explain it in detail. For the code compilation the Boost C++ and its regular expression libraries must be installed on a system. Boost can be either built from source or installed via some package manager; your choice. When compiling program that uses boost regex (like this one) use the -lboost_regex flag when linking with GCC C++ linker.
OK, here's the code:
- #include <iostream>
- #include <fstream>
- #include <string>
- using namespace std;
- #include <boost/regex.hpp>
- const string help =
- "Usage:"
- "\n\tccode2html [input file] [output file]"
- "\n\tIf output filename is omitted it will be saved as [input file].html\n";
- const string pre_expression = "(>)|(<)|(&)";
- const string pre_format = "(?1\\>)(?2\\<)(?3\\&)";
- const string line_expression = "^.*?$";
- const string line_format = "<li>$&</li>";
- const string whole_code_expression = "^.*$";
- const string whole_code_format = "<pre><ol>$&</ol></pre>";
- const string expressions =
- // single line comments
- "(//.*?(?=</li>))|"
- // multi-line comments
- "(/\\*.*?\\*/)|"
- // string literals
- "(\"(?:[^\\\\\"]|\\\\.)*\"|'(?:[^\\\\']|\\\\.)*')|"
- // precompile directives
- "(#.*?(?=</li>))|"
- // floating point numbers
- "(\\<[[:digit:]]+\\.[[:digit:]]+)|"
- // integer numbers
- "(\\<[[:digit:]]+\\>)|"
- // boolean literals
- "((?:\\<true\\>)|(?:\\<false\\>))|"
- // keywords
- "(\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
- "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
- "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
- "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
- "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
- "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
- "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
- "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
- "|using|virtual|void|volatile|wchar_t|while|NULL)\\>)";
- const string formats =
- "(?1<font color = \"#999999\"><i>$&</i></font>)"
- "(?2<font color = \"#D3D3D3\">$&</font>)"
- "(?3<font color = \"#009900\">$&</font>)"
- "(?4<font color = \"#006699\">$&</font>)"
- "(?5<font color = \"#996600\">$&</font>)"
- "(?6<font color = \"#993366\">$&</font>)"
- "(?7<font color = \"#990000\"><b>$&</b></font>)"
- "(?8<font color = \"#003399\"><b>$&</b></font>)";
- int main (int argc, char* argv[]) {
- string input_filenm;
- string output_filenm;
- if ( argc > 3 || argc == 1 ) {
- cout << help;
- exit (-1);
- }
- else if ( argc == 2) {
- input_filenm = argv[1];
- output_filenm = input_filenm + ".html";
- }
- else {
- string input_filenm = argv[1];
- string output_filenm = argv[2];
- }
- ifstream in ( input_filenm.c_str() );
- if (!in.is_open()) {
- cout << "Failed to open: " << input_filenm << '\n';
- }
- else {
- cout << input_filenm << " opened successfully\nProcessing...\n";
- }
- ofstream out ( output_filenm.c_str() );
- string in_string;
- char c;
- while (in.get(c)) {
- in_string.append(1,c);
- }
- boost::regex reg;
- // replace <, > and & signs with appropriate html escape characters
- reg.assign(pre_expression);
- in_string = boost::regex_replace(in_string, reg, pre_format, boost::match_default | boost::format_all);
- // add <li> ... </li> tags on each line
- reg.assign(line_expression);
- in_string = boost::regex_replace(in_string, reg, line_format, boost::match_default |
- boost::format_all | boost::match_not_dot_newline);
- // format and color code syntax
- reg.assign(expressions);
- in_string = boost::regex_replace(in_string, reg, formats, boost::match_default | boost::match_default |
- boost::format_all);
- // add <pre><ol> on start and </ol></pre> at the end of the file
- reg.assign(whole_code_expression);
- in_string = boost::regex_replace(in_string, reg, whole_code_format);
- in.close();
- out << in_string;
- out.close();
- return 0;
- }
First three lines include important header files for the program. Second include is required for file i/o operations on lines 72, 73 and 79. On the 6. line the file boost/regex.hpp is included. That file is required for regular expression functions and objects. I used only regex class and regex_replace functions in this code. String help is declared from lines 8 to 11 and it explains the usage of the program if zero or more than two command-line arguments are passed when program is ran.
The heart of the program is in the lines 13-55 and 85-103. See the Boost Regex Documentation for more information about regular expressions used in C++. Many regular expression tutorials and explanations can be found here also. There is even a book dedicated to that topic called: Mastering Regular Expressions which is strongly suggested.
I'll just shortly explain how I used regexes in this program. Two strings are declared at 13. and 14. lines. The first string pre_expression is passed to assign function of the regex class (see 88. line). It describes search pattern, in this case the document is searched for '<','>' and '&' signs that needs to be replaced with the HTML escape sequences. Those escape sequences are passed to the pre_format string inside. The ?1, ?2 and ?3 are representing the sub-expression indexes. After assigning the pre_expression string to reg regex on line 88 the function regex_replace, which takes four arguments, is called.
First argument in_string is the string filled with the whole code from the file. That string is searched with the pre_expression patterns. When a match is found it is replaced with the data in the pre_format string. Last parameter is used for boost specific flags.
First, the '<','>' and '&' signs are replaced with the HTML escape sequence equivalents. The <li> and </li> tags are added at start and end of each line for numbering. Than the code syntax is highlighted with appropriate HTML <font> tags declared in string formats on lines 47-55. Finally <pre> and <ol> and their closing tags are put on the start and the end of the file.
Only problem in this code is regex for multi-line comments. The expression '/\\*.*?\\*/' matches all text between the '/*' and '*/' including <li> and </li> tags inside. Never figured out how to exclude those tags from formating. If some regex professional is reading this please help out!
0 comments:
Post a Comment