UTF-8 Dump — od for UTF-8 files

This program will print characters from a UTF-8 encoded text file. A wide range of options is provided ranging from a plain numerical dump to looking up code point names in the official Unicode name table. Care has been taken to make a lot of detail available about erroneous UTF-8 encodings.

A manual page is inluded in the source which you can preview online (plain man2html output, so the cross-reference links will probably not work for you).

Download Source

Version 0.2.2 This version significatly impoves on the performance when there are lots of Unicode name lookups (the %n format). The command-line interface was also changed so that extra arguments are taken to be file names (by default). I got tired of forgetting to redirect input.

Remarks

I wrote this because I needed it and was surprised not to find anything that does this sort of job already written. Please let me know if I am wrong and there is already a better tool for this job.

Also, let me know if you find it useful as this will encourage me to fix and improve it.