I like to read eBooks while travelling. One of the best sources of free eBooks is Project Gutenberg.
I greatly respect what Project Gutenberg does, but at the same time I think its formatting standards are completely
outdated (which is no surprise knowing it started in 1971) and a stupid design decision.
In particular: truncating the lines to a fixed length and limiting to use ASCII as the encoding (as this rules out all non-english texts or cripples them severely).
Using UTF-8 instead would have practically no drawback, because the ASCII characters take up as much space while leaving full compatibility and extendability for
non-english texts.
They also refuse markup, which is exqually limiting, because it is very easy to remove markup to get back to plain text (just drop everything between < and >), wheter it's XML or HTML,
but very difficult to add it.
So I have written a python script that tries to improve the formatting and adds some bookmarks.
Useful stuff. If you need to know the pdb or prc file format for development purposes, look at the code that comes with the tools below.
Q: ?
A: