Some updates to my Calibre helper scripts (which
I described previously here): ISBN guessing
script covers wider range of file formats, there
are also two new scripts: one converts all .doc
files to .rtf
, another one crosschecks disk
directory against Calibre database and reports
documents not registered by Calibre.
ISBN guessing improvements
The guess_and_add_isbn.py script is now also handling
.chm
files. arCHMage is used to extract the text, so
must be installed. On Debian and Ubuntu it is packaged,
so just:
$ sudo apt-get install archmage
Regular expressions used to locate ISBN were slightly improved, some variations which were formerly missed are now handled.
Finally, while the script is running, it now properly reports the number of parsed files.
Convert docs to RTF
The convert_docs_to_rtf.py script locates all books which
have .doc
format but do not have .rtf
and creates .rtf
versions.
I wrote it because Calibre itself is not able to convert docs, but is
able to convert RTF to ebook formats.
OpenOffice is used for conversion, so must be installed. Also, the ootools library must be installed, simplest way to do it is to:
$ easy_install ootools
Note: the script happens to report obscure errors (Segmentation
fault
) while shutting down. I haven't tracked it down (it is either
a bug in one of the libraries, or my misuse of them) but it is harmless
(it happens after all conversions are finished, while the helper
objects are destructed).
Find hanging books
The find_books_missing_in_database.py scripts cross-checks the Calibre database against the contents of the Calibre disk folder, and reports any books which are present there but not registered in Calibre (so not found by searches and in general invisible in the interface).
I wrote the script after I corrupted my repository a little bit by trying to put it on Dropbox and using it from two machines (for some reason my Calibre installations disagreed about upper/lower cases, Dropbox messed it additionally by disallowing files with names which differed only by letter case, and finally I got database with some books without any format and unregistered books on the disk). But it can be used as a general health-checking tool.
Dropbox is very cool, I sync many files using it and wholeheartedly recommend it, this is just one of those corner cases where it does not fit.