Monday, 18 August 2008

C/C++: Get the CD/DVD volume name

ISO 9660 is a international standard that defines a filesystem for CD/DVD media. To get the volume name from such system in Linux we'll be dealing with the its device path once again. On my system the device path of the CD/DVD device is /dev/scd0.
Also, it is important to know where is this information stored on the CD. Surfing and google-ing around I found the site that explains ISO 9660 in detail: types, modes, sectors etc.
Field of interest (called Volume ID) is located in the sector 16 called Primary Volume Descriptor on the 32808. byte of the CD, and its length is 32 bytes.

Those include directives are important for the compilation:

  1. #include <stdio.h>

  2. #include <stdlib.h>

  3. #include <fcntl.h>

  4. #include <string.h>

  5. #include <unistd.h>

Next, function that can read chunk of data from the filesystem at specific byte is needed:

  1. int fs_read_data ( char * fs, int seek, int len, char * data)

  2. {

  3. unsigned int fd;

  4. int result = -1;

  5. if ( fs != NULL)

  6. {

  7. if ( ( fd = open (fs, O_RDONLY)) != -1)

  8. {

  9. if ( lseek (fd, seek, SEEK_SET) != -1)

  10. {

  11. if ( read ( fd, data, len) != -1)

  12. {

  13. data[len] = '\0';

  14. result = 0;

  15. }

  16. }

  17. }

  18. close ( fd);

  19. }

  20. return result;

  21. }

Function fs_read_data takes four arguments. String fs is a device path of a iso9660 filesystem, variable seek is a byte number at which we start reading data, len is number of bytes to read and data is a character buffer that stores the read data.
If the fs string is NULL, function returns -1 as a result, otherwise the device file is opened and data is being read from it.

Function open (read more) tries to open the device file representing our filesystem (/dev/scd0). Upon successful completion, the function shall open the file and return a non-negative integer representing the lowest numbered unused file descriptor; otherwise, -1 shall be returned. Second argument of the open function says to open the file for rading only.

Function lseek (read more) at line 9 sets the number of bytes to skip for the current file descriptor. In this case the number 32808 will be passed.

Finally, data is being read with the function read (read more) at line 11. It reads the len number of bytes from the file associated with the fd file descriptor and puts that chunk of bytes into data buffer. In our case the len will be 32.

If all the conditions of the if statements are true the function fs_read_data will return 0 and data will be filled. Character '\0' marks the end of the string.

The main function is now constructed like this:

  1. int main() {

  2. char * fs = "/dev/scd0";

  3. int seek =32808;

  4. int len = 32;

  5. char * buff = NULL;

  6. char * volume_name = NULL;

  7. buff = malloc( (sizeof (buff)) * (len+1));

  8. if ( fs_read_data ( fs, seek, len, buff) != -1)

  9. {

  10. if ( (strncmp ( buff, "NO NAME", 7) == 0) || ( strncmp (buff, " ",1) == 0))

  11. {

  12. free ( buff);

  13. buff=NULL;

  14. volume_name = "None";

  15. }

  16. else

  17. {

  18. volume_name = strdup ( buff);

  19. }

  20. }

  21. printf("%s", volume_name);

  22. free (buff);

  23. buff=NULL;

  24. return 0;

  25. }

All important variables are declared in first 7 lines. Notice that malloc function is used to allocate 32 bytes in the buffer buff.
The if statement at line 9 tries to read wanted Volume ID field from the /dev/scd0 device file representing my CD/DVD device. Next, strcmp function is used to check if the string on that field is empty string ("") or it says "NO NAME". If either of that is true the volume name of the ISO 9660 filesystem isn't set, the buff is freed and "None" is assigned to the volume_name char array. Otherwise, the content of the buff is duplicated into volume_name. At the end volume name is printed on the screen.

Friday, 15 August 2008

Convert C++ to syntax colored HTML code with line numbers

Today I made a small C++ program which converts C/C++ code to syntax highlighted HTML code. Changes on the code snippets on previous post are already been made. You can also see the program in action in this post. Like it? :)
This code is not so small so I will not explain it in detail. For the code compilation the Boost C++ and its regular expression libraries must be installed on a system. Boost can be either built from source or installed via some package manager; your choice. When compiling program that uses boost regex (like this one) use the -lboost_regex flag when linking with GCC C++ linker.
OK, here's the code:

  1. #include <iostream>

  2. #include <fstream>

  3. #include <string>

  4. using namespace std;

  5. #include <boost/regex.hpp>

  6. const string help =

  7. "Usage:"

  8. "\n\tccode2html [input file] [output file]"

  9. "\n\tIf output filename is omitted it will be saved as [input file].html\n";

  10. const string pre_expression = "(>)|(<)|(&)";

  11. const string pre_format = "(?1\\&gt;)(?2\\&lt;)(?3\\&amp;)";

  12. const string line_expression = "^.*?$";

  13. const string line_format = "<li>$&</li>";

  14. const string whole_code_expression = "^.*$";

  15. const string whole_code_format = "<pre><ol>$&</ol></pre>";

  16. const string expressions =

  17. // single line comments

  18. "(//.*?(?=</li>))|"

  19. // multi-line comments

  20. "(/\\*.*?\\*/)|"

  21. // string literals

  22. "(\"(?:[^\\\\\"]|\\\\.)*\"|'(?:[^\\\\']|\\\\.)*')|"

  23. // precompile directives

  24. "(#.*?(?=</li>))|"

  25. // floating point numbers

  26. "(\\<[[:digit:]]+\\.[[:digit:]]+)|"

  27. // integer numbers

  28. "(\\<[[:digit:]]+\\>)|"

  29. // boolean literals

  30. "((?:\\<true\\>)|(?:\\<false\\>))|"

  31. // keywords

  32. "(\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"

  33. "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"

  34. "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"

  35. "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"

  36. "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"

  37. "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"

  38. "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"

  39. "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"

  40. "|using|virtual|void|volatile|wchar_t|while|NULL)\\>)";

  41. const string formats =

  42. "(?1<font color = \"#999999\"><i>$&</i></font>)"

  43. "(?2<font color = \"#D3D3D3\">$&</font>)"

  44. "(?3<font color = \"#009900\">$&</font>)"

  45. "(?4<font color = \"#006699\">$&</font>)"

  46. "(?5<font color = \"#996600\">$&</font>)"

  47. "(?6<font color = \"#993366\">$&</font>)"

  48. "(?7<font color = \"#990000\"><b>$&</b></font>)"

  49. "(?8<font color = \"#003399\"><b>$&</b></font>)";

  50. int main (int argc, char* argv[]) {

  51. string input_filenm;

  52. string output_filenm;

  53. if ( argc > 3 || argc == 1 ) {

  54. cout << help;

  55. exit (-1);

  56. }

  57. else if ( argc == 2) {

  58. input_filenm = argv[1];

  59. output_filenm = input_filenm + ".html";

  60. }

  61. else {

  62. string input_filenm = argv[1];

  63. string output_filenm = argv[2];

  64. }

  65. ifstream in ( input_filenm.c_str() );

  66. if (!in.is_open()) {

  67. cout << "Failed to open: " << input_filenm << '\n';

  68. }

  69. else {

  70. cout << input_filenm << " opened successfully\nProcessing...\n";

  71. }

  72. ofstream out ( output_filenm.c_str() );

  73. string in_string;

  74. char c;

  75. while (in.get(c)) {

  76. in_string.append(1,c);

  77. }

  78. boost::regex reg;

  79. // replace <, > and & signs with appropriate html escape characters

  80. reg.assign(pre_expression);

  81. in_string = boost::regex_replace(in_string, reg, pre_format, boost::match_default | boost::format_all);

  82. // add <li> ... </li> tags on each line

  83. reg.assign(line_expression);

  84. in_string = boost::regex_replace(in_string, reg, line_format, boost::match_default |

  85. boost::format_all | boost::match_not_dot_newline);

  86. // format and color code syntax

  87. reg.assign(expressions);

  88. in_string = boost::regex_replace(in_string, reg, formats, boost::match_default | boost::match_default |

  89. boost::format_all);

  90. // add <pre><ol> on start and </ol></pre> at the end of the file

  91. reg.assign(whole_code_expression);

  92. in_string = boost::regex_replace(in_string, reg, whole_code_format);

  93. in.close();

  94. out << in_string;

  95. out.close();

  96. return 0;

  97. }

First three lines include important header files for the program. Second include is required for file i/o operations on lines 72, 73 and 79. On the 6. line the file boost/regex.hpp is included. That file is required for regular expression functions and objects. I used only regex class and regex_replace functions in this code. String help is declared from lines 8 to 11 and it explains the usage of the program if zero or more than two command-line arguments are passed when program is ran.

The heart of the program is in the lines 13-55 and 85-103. See the Boost Regex Documentation for more information about regular expressions used in C++. Many regular expression tutorials and explanations can be found here also. There is even a book dedicated to that topic called: Mastering Regular Expressions which is strongly suggested.

I'll just shortly explain how I used regexes in this program. Two strings are declared at 13. and 14. lines. The first string pre_expression is passed to assign function of the regex class (see 88. line). It describes search pattern, in this case the document is searched for '<','>' and '&' signs that needs to be replaced with the HTML escape sequences. Those escape sequences are passed to the pre_format string inside. The ?1, ?2 and ?3 are representing the sub-expression indexes. After assigning the pre_expression string to reg regex on line 88 the function regex_replace, which takes four arguments, is called.
First argument in_string is the string filled with the whole code from the file. That string is searched with the pre_expression patterns. When a match is found it is replaced with the data in the pre_format string. Last parameter is used for boost specific flags.

First, the '<','>' and '&' signs are replaced with the HTML escape sequence equivalents. The <li> and </li> tags are added at start and end of each line for numbering. Than the code syntax is highlighted with appropriate HTML <font> tags declared in string formats on lines 47-55. Finally <pre> and <ol> and their closing tags are put on the start and the end of the file.

Only problem in this code is regex for multi-line comments. The expression '/\\*.*?\\*/' matches all text between the '/*' and '*/' including <li> and </li> tags inside. Never figured out how to exclude those tags from formating. If some regex professional is reading this please help out!

Wednesday, 13 August 2008

C/C++: Check the filesystem is mounted

I'll write and explain the function that checks the /etc/mtab file for the information if the given device is mounted. For this the default C FILE pointer and the mntent structure will be used. More information about the mntent struct and what elements it holds see this page. Three includes are needed before the coding starts:

  1. #include <stdio.h>

  2. #include <mntent.h>

  3. #include <string.h>

First line doesn't need much explanation; just notice it is needed for the file i/o. Second line includes mntent.h where the mntent structure and functions are declared and the third line includes string.h for string manipulation functions (see the string.h documentation).
Here is the code of the is_mounted function:

  1. int is_mounted (char * dev_path) {

  2. FILE * mtab = NULL;

  3. struct mntent * part = NULL;

  4. int is_mounted = 0;

  5. if ( ( mtab = setmntent ("/etc/mtab", "r") ) != NULL) {

  6. while ( ( part = getmntent ( mtab) ) != NULL) {

  7. if ( ( part->mnt_fsname != NULL )

  8. && ( strcmp ( part->mnt_fsname, dev_path ) ) == 0 ) {

  9. is_mounted = 1;

  10. }

  11. }

  12. endmntent ( mtab);

  13. }

  14. return is_mounted;

  15. }

In first three lines of the is_mounted function file pointer mtab, mntent structure pointer part and integer is_mounted are declared; we assign NULL value to pointers and a zero to the is_mounted integer (assumption is the device isn't mounted). Program would work without assigning NULLs to pointers but it's there for security reasons.

In the fourth line the setmntent function is introduced. This function takes two arguments, first is the path to the mtab file (it's location is /etc/mtab on most Linux systems) and the second is short string determines how the file is opened by the program. In this case file /etc/mtab is trying to be opened for reading only. setmntent function is similar to the fopen function declared in stdio.h.
Next, the getmntent function takes a file pointer as argument, reads a line from the mtab file and fills the mntent structure.
While loop in this code reads the /etc/mtab file line by line and checks if it contains the required information. We are actually comparing given device path char * dev_path with the each of the device paths inside the mtab file (part->mnt_fsname). We are using function strcmp for this. If the mtab contains it that means the filesystem (device) is mounted and we assign number one to our is_mounted variable. Also the mnt_fsname element also must not be NULL.
After the while loop the function endmntent closes the file system description file mtab.
At the end the is_mounted variable is returned.

Now the famous main function can be constructed like this:

  1. int main() {

  2. if (is_mounted("/dev/scd0")) {

  3. printf("CDROM mounted!\n");

  4. }

  5. else {

  6. printf("CDROM not mounted!\n");

  7. }

  8. return 0;

  9. }

The /dev/scd0 is a device path of my cd/dvd device. This should be the same on all Debian based systems. If the device path of the filesystem is unknown this check can still be performed by comparing the mount path with the mnt_dir element of the mntent structure.