Jonathan Dowland: Code formatting in documents
I've been exploring typesetting and formatting code within
text documents such as papers, or my thesis. Up until now,
I've been using the listings package without thinking
much about it. By default, some sample Haskell code
processed by listings looks like this (click any of the
images to see larger, non-blurry versions):
It's formatted with a monospaced font, with some keywords highlighted,
but not syntactic symbols.
There are several other options for typesetting and formatting code in LaTeX
documents. For Haskell in particular, there is the preprocessor lhs2tex,
The default output of which looks like this:
A proportional font, but it's taken pains to preserve vertical alignment, which
is syntactically significant for Haskell. It looks a little cluttered to me,
and I'm not a fan of nearly everything being italic. Again, symbols aren't
differentiated, but it has substituted them for more typographically
pleasing alternatives:
->
has become
, and \
is now
.
Another option is perhaps the newest, the LaTeX package minted, which
leverages the Python Pygments program. Here's the same code again. It
defaults to monospace (the choice of font seems a lot clearer to me than the
default for listings
), no symbolic substitution, and liberal use of colour:
An informal survey of the samples so far showed that the minted output was
the most popular.
All of these packages can be configured to varying degrees. Here are some
examples of what I've achieved with a bit of tweaking
All of this has got me wondering whether there are straightforward empirical
answers to some of these questions of style.
Firstly, I'm pretty convinced that symbolic substitution is valuable. When
writing Haskell, we write ->
, \
, /=
etc. not because it's most legible,
but because it's most practical to type those symbols on the most widely
available keyboards and popular keyboard layouts.1 Of the three
options listed here, symbolic substitution is possible with listings and
lhs2tex, but I haven't figured out if minted can do it (which is really
the question: can pygments do it?)
I'm unsure about proportional versus monospaced fonts. We typically use
monospaced fonts for editing computer code, but that's at least partly for
historical reasons. Vertical alignment is often very important in source code,
and it can be easily achieved with monospaced text; it's also sometimes
important to have individual characters (.
, etc.) not be de-emphasised by being
smaller than any other character.
lhs2tex, at least, addresses vertical alignment whilst using proportional
fonts. I guess the importance of identifying individual significant characters
is just as true in a code sample within a larger document as it is within
plain source code.
From a (brief) scan of research on this topic, it seems that proportional
fonts result in marginally quicker reading times for regular prose. It's
not clear whether those results carry over into reading computer code in
particular, and the margin is slim in any case. The drawbacks of monospaced
text mostly apply when the volume of text is large, which is not the case
for the short code snippets I am working with.
I still have a few open questions:
- Is colour useful for formatting code in a PDF document?
- does this open up a can of accessibility worms?
- What should be emphasised (or de-emphasised)
- Why is the minted output most popular: Could the choice of font be key? Aspects of the font other than proportionality (serifs? Size of serifs? etc)
-
The Haskell package Data.List.Unicode lets the programmer
use a range of unicode symbols in place of ASCII approximations, such
as
elem
,/=
. Sadly, it's not possible to replace the denotation for an anonymous function,\
, with