
Today I was wondering about converting a pdf made from scan of a book
into djvu, hopefully to reduce the size, without too much loss of
quality. My initial experiments with
pdf2djvu were a bit
discouraging, so I invested some time building
gsdjvu in order to be able
to run
djvudigital
.
Watching the messages from
djvudigital
I realized that the reason it
was achieving so much better compression was that it was using black
and white for the foreground layer by default. I also figured out that
the default 300dpi looks crappy since my source document is apparently
600dpi.
I then went back an compared
djvudigital
to
pdf2djvu
a bit more
carefully. My not-very-scientific conclusions:
- monochrome at higher resolution is better than coloured foreground
- higher resolution and (a little) lossy beats lower resolution
- at the same resolution,
djvudigital
gives nicer output, but at the
same bit rate, comparable results are achievable with pdf2djvu
.
Perhaps most compellingly, the output from
pdf2djvu
has sensible
metadata and is searchable in evince. Even with the --words option,
the output from djvudigital is not. This is possibly related to the
error messages like
Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .
It could well be my fault, because building
gsdjvu
involved guessing at corrections for several errors.
- comparing
GS_VERSION
to 900 doesn't work well, when GS_VERSION
is a 5 digit number. GS_REVISION
seems to
be what's wanted there.
- extra declaration of struct timeval deleted
- -lz added to command to build mkromfs
Some of these issues have to do with building software from 2009 (the
instructions suggestion building with ghostscript 8.64) in a modern
toolchain; others I'm not sure. There was an upload of
gsdjvu
in
February of 2015, somewhat to my surprise. AT&T has more or less
crippled the project by licensing it under the CPL, which means
binaries are not distributable, hence motivation to fix all the rough
edges is minimal.
Version |
kilobytes per page |
position in figure |
Original PDF |
80.9 |
top |
pdf2djvu --dpi=450 |
92.0 |
not shown |
pdf2djvu --monochrome --dpi=450 |
27.5 |
second from top |
pdf2djvu --monochrome --dpi=600 --loss-level=50 |
21.3 |
second from bottom |
djvudigital --dpi=450 |
29.4 |
bottom |