UTF-8 Dump — od for UTF-8 files
This program will print characters from a UTF-8 encoded text file. A wide range of options is provided ranging from a plain numerical dump to looking up code point names in the official Unicode name table. Care has been taken to make a lot of detail available about erroneous UTF-8 encodings.
A manual page is inluded in the source which you can preview online (plain man2html output, so the cross-reference links will probably not work for you).
Download Source
Version 0.2.2 | This version significatly impoves on the performance when there are lots of Unicode name lookups (the %n format). The command-line interface was also changed so that extra arguments are taken to be file names (by default). I got tired of forgetting to redirect input. |
Remarks
I wrote this because I needed it and was surprised not to find anything that does this sort of job already written. Please let me know if I am wrong and there is already a better tool for this job.
Also, let me know if you find it useful as this will encourage me to fix and improve it.