how-much-space-do-those-hyphens-take?

In a rant about naming conventions, Yossi Kreinin praises the way Lisp does multi-word names:

I think that the best naming convention out there is the Lisp one: case-insensitive-dash-separated. It just doesn’t get any better:

  • You never have to hit Shift or Caps Lock in the middle of a name, which makes typing easier. This is especially important for names because you use auto-completion with names. Auto-completion requires you to press things like Ctrl-Space or Alt-Slash or Ctrl-P. Together with another Shift needed for the name to be completed correctly, auto-completion is much more likely to cause repetitive strain injury.
  • You never have to think about the case. Figuring out the case of a letter in a case-sensitive naming convention can be non-trivial; more on that later.
  • The dash-as-a-separator-convention is used in English, so your names look natural.

I'm not so fond of this convention. Well, it is nice to read; I can't think of anything more natural. And case-irrelevance is nice. (If your favourite lisp stopped case-folding one day and became case-sensitive in practice as well as theory, would you even notice?) There's just one problem: those hyphens take space.

How much?

(let ((syms 0) (chars 0) (hyphens 0))
  (do-symbols (s (find-package :cl))
    (incf syms)
    (incf chars (length (symbol-name s)))
    (incf hyphens (count #\- (symbol-name s))))
  (format nil "~S symbols, ~S chars (avg. ~2,2f)~%~
             ~S hyphens, ~2,2f% hyphens (avg. ~2,2f)~%"
    syms chars (/ chars syms 1.0) hyphens
    (/ hyphens chars 0.01) (/ hyphens syms 1.0)))

"978 symbols, 11271 chars (avg. 11.52)
875 hyphens, 7.76% hyphens (avg. .89)"

Hmm. Not as bad as I thought. 7.7% sounds like a lot, but most of those hyphens are in rarely-used names; the common ones are short. The big expression above has only four hyphens (plus the #\-), so it's not a huge problem. On the other hand, standard library names (even in CL) tend to be short; user code is often full of rather long names. Let's see...

(defun slurp (filename)
  "Read the contents of a file as a string."
  (with-open-file (s filename)
    (let* ((len (file-length s))
           (str (make-string len)))
      (do ((i 0 (1+ i)))
          ((>= i len) str)
        (setf (schar str i) (read-char s t)))))

(defun count-hyphens (filename)
  (let ((s (slurp filename)))
    (format t "~2,2f% of ~S chars~%"
    (/ (count #\- s) (length s) 0.01) (length s))))

(count-hyphens "asdf-install/installer.lisp")
3.13% of 12210 chars
(count-hyphens "edit-modes/python.el")
3.13% of 90348 chars   ;coincidence
(count-hyphens "keywiz.el")
2.88% of 10230 chars
(count-hyphens "tetris.el")
4.54% of 22458 chars
(count-hyphens "edit-modes/slime/swank.lisp")
2.86% of 191438 chars

There's a lot of variation (partly because some files are diluted with lots of comments, which I didn't try to strip out), but evidently hyphens are still significant. Tetris.el has a lot because it has a lot of names with short words, like tetris-top-left-x. Elisp in general suffers (in total length as well as hyphens) because it has no module system, so every name needs a module prefix. (If you want to make a language's code shorter, this is the first thing to worry about: make sure users don't have to defend against name collisions.) But even in CL, hyphens take around 3% of the code. That's about half as much as parentheses.

Maybe this isn't directly a problem. People probably don't look at the hyphens even as much as they look at parentheses, so what does it matter how many there are? But they still lengthen lines, making the eyes move farther, and increasing the chance of linebreaks. When one character accounts for 3% of your code, it's hard to argue that it doesn't matter. But as with parentheses, it's even harder to do anything about it, since all the alternatives are harder to read. So we use hyphens. But I still don't feel altogether good about them.

Update: It seems some people think I'm bashing Lisp on the grounds that hyphenated names make its code 3% longer. Relax; of course this is not a big problem, and I like Lisp, or I wouldn't be worrying about its naming conventions. And I prefer hyphens to every alternative, because they're so much more readable. But this readability isn't free, and at 3% the cost (in screen space) is surprisingly high. I don't have a solution, but it's nice to be aware of the problem. I will take CamelCase a little more seriously now.

No comments:

Post a Comment

It's OK to comment on old posts.