Why use keywords as symbols?

From the beginning, Lisp has been associated with symbolic programming, for which symbols have always been the canonical type. So it's odd that many Lisps also have a separate :keyword type, which they use instead of symbols for all symbolic purposes except representing their own programs. Why bother? Why have two separate types that mean the same thing?

The motivation, in Common Lisp and (AFAIK) Clojure, is that these languages' symbols aren't pure symbols: in addition to identity and name, they include some other features, in particular a package or namespace. That's useful for representing programs, but most other applications of symbols don't want namespaces, so the languages also have keywords to serve as pure symbols. Lisps whose symbols don't have packages don't usually bother with keywords, but Racket and elisp have them anyway for use with keyword arguments. (In Racket, they need to be distinct from symbols because the function call form handles them specially rather than just passing them as part of the arglist. In elisp, they're only cosmetic; plain symbols could have been used instead.)

Avoiding namespaces is a good reason to use keywords, but I don't think it's the only reason, because users don't see keywords as a workaround; they seldom have to be told to avoid symbols because they're more complex than they appear. Instead, they overwhelmingly prefer keywords, not just for keyword arguments but as symbolic values. Even in elisp, where there's no reason not to use ordinary symbols, keywords are sometimes used instead. Why?

For one thing, they help distinguish programs from data. Humans, even Lispers who know better, tend to see these as two different things, and get confused when they mistake one for the other. When programs are written with symbols and data is written with keywords, it's easy to tell them apart.

Keywords are also more regular in appearance, because they're self-evaluating. It's tempting (for beginners, and as an unconscious heuristic for non-beginners) to think of symbols and keywords as differing only in their prefix character: 'x is the practical equivalent of :x, right? But they're not equivalent, because symbols have a quote only when they appear as forms. When they appear in a larger piece of quoted data, or as output from the REPL, or in non-evaluated contexts, they don't have the quote. This is only a superficial inconsistency, but it's annoying:

CL-USER> (list 'a 'b 'c)
(A B C)
CL_USER> (second '(a b c))
B
CL-USER> (defclass foo () ((a type integer initarg a accessor foo-a)))

(That last is not valid CL, obviously; it's what defclass would look like without keywords. I suspect users would perennially forget that most of the slot options aren't evaluated, even more than they already do, and put in quotes anyway, because they're not accustomed to writing unquoted symbols.)

Keywords, because they're self-evaluating, don't have this variation. They look the same whether they're in quoted data, or in the output of the REPL, or standing alone as forms, or as non-forms:

CL-USER> (list :a :b :c)
(:A :B :C)
CL-USER> (second '(:a :b :c))
:B
CL-USER> (defclass foo () ((a :type integer :initarg :a :accessor foo-a)))

More comfortable, isn't it? I wonder if this explains some of the popularity of keywords. (And also of Clojure's vectors, which while not strictly self-evaluating are similarly more regular in appearance than lists, because they don't usually require quoting.) If so, this is a disappointment to me, because it means this seemingly pointless distinction, like that between symbols and strings, is actually doing something important, and can't necessarily be simplified away.

7 comments:

  1. "For one thing, they help distinguish programs from data. Humans, even Lispers who know better, tend to see these as two different things, and get confused when they mistake one for the other."

    I keep getting the feeling that code isn't data; there's just a bijection between the two. Perhaps a Lisp with a strong type system should enforce this separation.

    (Well, not a bijection; not every piece of data is a valid AST. And I'm definitely talking in terms of inspecting ASTs, rather than the fact that evaluating a piece of code results in a piece of data.)

    ReplyDelete
  2. I notice a third variety of symbol in Common Lisp code... #:new-uninterned-symbols

    As a relatively new initiate to CL, I find the use of this syntax instead of unquoted symbols/keywords, e.g. in defpackage, to be slightly baffling. Is it simply to prevent the symbol from being interned? However, the syntax appears to be necessary for other reasons... namely that there must be a way to textually distinguish a given package's interned symbols from other symbols with the same name.

    ReplyDelete
  3. ehird: Of course programs are (a subset of) data! But my thinking was sloppy; what I should have said is that keywords help distinguish data that is Lisp code from other sorts of symbolic data. The issue is not confusion over whether code is data, but confusion between different types of data which require different cognitive approaches — in particular, when one looks at code, one thinks about its evaluation semantics and the resulting values, but for most other symbolic data, one thinks more directly about its structure.

    Uroš: Yes, it's to prevent them being interned, since when defpackage and in-package are read, the current package might be anything, and we don't want to pollute arbitrary packages with new symbols. Things like (cl:in-package #:foo) are a workaround for the reader's unwanted dependence on the value of *package*. (The problem isn't that symbols have packages per se, but that the package depends on the read-time environment. Clojure avoids this by resolving namespaces at compile time, not read time.)

    ReplyDelete
  4. Well, yes, code is data, but is an application a list? I suppose what I mean is that code is a separate set of types, so this error is actually simply down to the popularity of representing data in a relatively untyped form with lists and symbols in Lisp-alikes.

    ReplyDelete
  5. The (momentary) confusion I'm thinking of is for humans reading code, not programs operating on it, so the issue isn't whether forms are really lists of symbols, but whether their textual representation looks the same. In Racket, for instance, forms aren't lists of symbols, but they look exactly alike.

    ReplyDelete
  6. In Scheme, of course, symbols don't have package baggage, but most Schemes don't have keyword arguments either. What is nice is that they are easy to get with R5RS plus records (= CL structs): a keyword is an identifier bound to an object containing its name, and keyword objects are equal iff their names are equal. See KeywordArgumentsArcfide.

    ReplyDelete
  7. This is an old post I know, but I have was just blogging about self-evaluating symbolic data here: http://jonathanwarden.com/2016/03/31/self-evaluating-symbolic-data-literals/

    ReplyDelete

It's OK to comment on old posts.