Arcane Sentiment: January 2011

If you asked me what a symbol is, I'd probably say something like “a canonicalized string”. I might explain why this is useful, or mention that some languages (e.g. Common Lisp) include top-level bindings and plists and packages as part of their symbols, even though those are really a different concept. But I'd take for granted that canonical identity is the central idea. You can't have symbols without intern, can you?

Clojure does. It creates a new symbol for each occurrence it reads, with no attempt to canonicalize them. Despite its name, clojure.lang.Symbol/intern always makes a new symbol. It does intern the strings that are their names, to make comparisons faster, but not the symbols themselves. (There's also clojure.core/intern, but that creates Vars — top-level definitions — not Symbols.) Observe:

user=> (identical? 'a 'a)
false
user=> (= 'a 'a)
true
user=> (identical? (clojure.lang.Symbol/intern "a") (clojure.lang.Symbol/intern "a"))
false

(identical? is Java ==, and = is Java .equals. The lengths of their names hint at their relative frequency in Clojure.)

Because symbols aren't interned, any code doing symbolic processing must compare them by name rather than identity. This looks at first glance like sloppiness, or a missing feature. Indeed, it does have one harmful effect: gensym can't quite guarantee that the symbols it creates won't collide with anything (although this risk is largely theoretical, and could be essentially eliminated by putting gensyms in their own namespace). But for the most part, it's harmless. Interning symbols sounds like a basic semantic feature, but it's not. It's only an optimization.

Digression: namespaces, and references vs. names

Symbol comparisons are actually more complicated than =, because Clojure symbols also optionally include a namespace. user/a and a are thus two different symbols — one with a namespace, one without — but they name the same variable:

user=> (= 'user/a 'a)
false
user=> (def a 1)
#'user/a
user=> user/a
1
user=> (def user/a 2)  ;redefinition!
#'user/a
user=> a
2

This appears to make symbol comparison impossible, since determining what definition a name refers to requires looking at the environment, not just at the symbol. But this is true in any language with lexical scope; including a namespace in the symbol doesn't make it harder. It's only a potential problem for plain old symbol processing, and you probably wouldn't use qualified symbols for that anyway.

What's really going on here is that symbols in Clojure have two roles. In addition to being names, they're also used as references to names, relative to the current namespace. Clojure uses the same class for both concepts: references may or may not specify a namespace, but names always(?) do. References are resolved to names (by clojure.lang.Compiler/resolveSymbol), so the first def above defines a variable called user/a, not a. This means names can be compared by = without worrying further about namespaces or collisions between them (although local bindings can still collide). In a language which resolved references directly to their definitions rather than via names, it would not be necessary to include the namespace in names, and SymbolRef could be distinct from Symbol.

Clojure's unconventional symbols

Digression: namespaces, and references vs. names