Arcane Sentiment: Good and bad syntax-coloring

David Nolen pretty-printed some Clojure with SLaTeX. The result, as usual for SLaTeX, is typographically impressive but rather painful to read. The problem is that SLaTeX has horrible defaults for syntax coloring. It shows most code, including this Clojure, in only two faces:

Special form names are in bold.
Other names are in italics.

Italics are harder to read than plain (“Roman”) text, because they're less familiar, so using them for most symbols hurts readability. (SLaTeX also uses proportional fonts; I'm not sure whether that helps or hurts.) But there's a much bigger problem: misplaced emphasis.

Bold, like bright colors, draws attention. This is very useful: it can help the reader quickly find important bits of code they might otherwise overlook. But special forms, however semantically special, are not such important bits. When reading a program, occurrences of lambda or begin are hardly ever the parts that deserve notice. They do deserve some subtle syntax-coloring, but printing them in bold makes code considerably less readable, because it directs attention to the irrelevant.

It's much more useful to emphasize binding occurrences, especially top-level ones. One of the most common operations when reading code is to look for a definition, and showing the names prominently makes it easier to find the right one. Emacs, as usual, gets this right: for most languages it shows top-level binding occurrences in bright colors, which makes them easy to find.

Clojure-mode also colors names in clojure.core. I've found this quite helpful for learning the language, because it provides an edit-time hint of whether I've guessed the right name. I think it's also a small help when reading code, because the color used is low-contrast, so it makes calls to standard-library functions less conspicuous — which is good, because they're usually less deserving of attention than calls to my own functions. It may also be helpful when reading an unfamiliar language, because it tells the reader when to guess what a name means or look it up with doc or clojuredocs rather than look for its definition elsewhere in the program.

It's easy to change SLaTeX's fonts to something less annoying:

\def\keywordfont#1{{\it#1\/}}
\def\constantfont#1{{\rm#1}}
\def\variablefont#1{{\rm#1}}
\let\datafont\constantfont

If you don't mind abusing \setkeyword, you can also make it recognize names from the standard library, by pretending they're special forms:

\setkeyword{sorted-map read-line re-pattern keyword? ClassVisitor
  asm-type VecSeq print-defrecord val IVecImpl ProcessBuilder chunked-seq?
  Enum find-protocol-impl SuppressWarnings
  %and so on for the rest of (keys (ns-map (find-ns 'clojure.core)))
}

Unfortunately SLaTeX doesn't support this directly, nor does it support recognizing top-level binding occurrences, but neither would be hard to add; it already recognizes similar things like macro-defining forms and quoted literals. If you're writing a pretty-printer (or a syntax-coloring editor), I encourage you to add these two features, because they're much more useful than highlighting special forms.

2 comments:

That Bassett Disaster28 March 2011 at 21:07
This post made me notice that my editor highlights keywords in many languages in bold, and that made me think about why that choice was made, and spurred me into doing a bit of research.

Specifically, I thought of how often I've seen keywords typeset in bold in pseudocode. And how much of that typeset pseudocode resembled Algol. And that I don't ever recall seeing keywords not in bold in typeset Algol. And wondering why I thought there was some Algol-y reason for that.

And I looked into it a bit and it turns out that, long before we ever had syntax colouring, there was something called stropping.

And now, I have a theory for why keywords are so often rendered in boldface, even though it doesn't make much sense from a usability POV: bold was the standard way to strop Algol in typesetting, and for a while, Algol was likely the most commonly typeset language on the planet.

Old habits die hard...
Arcane Sentiment28 March 2011 at 23:48
Pascal also traditionally used boldface, and it kept the tradition alive for a long time (and Delphi still does today). The implementation I learned it with (Macintosh Pascal) didn't even compile to native code, but its editor automatically displayed reserved words in bold. I liked that, and when I later encountered tools that didn't do it, I thought them obviously inferior. Maybe identifying reserved words is more useful, or at least comforting, to beginners.

It's OK to comment on old posts.