Weak symbol tables in Common Lisp?

Christophe Rhodes asks, about Common Lisp:

something that has been suggested a few times is that packages should intern their symbols in a hash table which is weak: dear cleverweb, is there any way in which this can be detected? Think like pfdietz

Disappearing symbols can be detected by find-symbol...

CL-USER> 'foo
FOO
CL-USER> (setf * nil) ;get rid of the REPL's reference
NIL
CL-USER> (gc :full t)
NIL
CL-USER> (find-symbol "FOO")
NIL

...or by counting the symbols in a package (which might plausibly happen as part of a show-all-packages feature)...

CL-USER> 'foo
FOO
CL-USER> (loop for s being the symbols of "CL-USER" count t)
1265
CL-USER> (progn (setf ** nil) (gc :full t))
NIL
CL-USER> (loop for s being the symbols of "CL-USER" count t)
1264

...or by the apparent change in the home package of a symbol when a shadowing symbol disappears:

CL-USER> (defpackage "EXPORTER" (:export "FOO"))
#<PACKAGE "EXPORTER">
CL-USER> (defpackage "IMPORTER")
#<PACKAGE "IMPORTER">
CL-USER> (intern "FOO" "IMPORTER")
IMPORTER::FOO
NIL
CL-USER> (use-package "EXPORTER" "IMPORTER")
T
CL-USER> (intern "FOO" "IMPORTER")
IMPORTER::FOO
:INTERNAL
CL-USER> (progn (setf * nil) (gc :full t))
NIL
CL-USER> (intern "FOO" "IMPORTER")
EXPORTER:FOO
:INHERITED

I don't think any of these are really problems. It's already dangerous to attach any meaning to whether a symbol exists, since it might be unpredictably created by read. Unpredictable disappearances aren't any more confusing.

But instead of “thinking like pfdietz”, what about thinking like a user? There's a much more important problem here: symbols have definitions!

CL-USER> (defun foo ())
FOO
CL-USER> (progn (setf * nil) (gc :full t))
NIL
CL-USER> (foo)
Error: FOO has no fdefinition

CL symbols combine two functions: that of interned strings and that of top-level bindings. Top-level bindings are exceedingly important things, and the symbol table is often the only reference to them. So CL can't GC unreferenced symbols. :( And indeed, SBCL, Clisp and ABCL all hold their symbols strongly.

In languages with pure symbols, however, this is not a problem, so weak symbol tables are common (so much so that it never occurred to me that they might be impossible in CL!). They're present in Racket, OpenDylan, Clojure (for keywords) and even Java (for String.intern).

It might be safe to hold symbols weakly, even in CL, when they have no interesting properties: when they're not boundp or fboundp, have nothing on their plists, and aren't external. These are the symbols that, if GCed and interned again, would be recreated exactly as they were. (Keywords could be held weakly even if boundp, since intern would recreate that too.)

This would allow garbage collection of symbols which have only been read, but protect those that have been used for anything else. This could be implemented by keeping a separate weak hashtable for “plain” symbols, and moving them to the regular table when they become “interesting” (on set, (setf fdefinition), (setf get), etc.).

Is this a good idea? Is it worth the complexity? Is this all obvious and the real question is whether there's anything in the spec that forbids it?

3 comments:

  1. Some older Lisps had the function GCTWA, which removed atoms that that had no dynamic or function binding as well as no references to them. It stood for Garbage-Collect Truly Worthless Atoms.

    ReplyDelete
  2. Another fine wheel turns out to have been invented by the ancients!

    ReplyDelete
  3. In standard Common Lisp, weak symbols could be detected either by detecting the loss of information attached to them (symbol-function, symbol-value, and the symbol's property list, for example) or by the change in behavior of the constructs that iterate over the symbols of a package (or over all symbols). The first would still allow weakness of symbols that have only default values for those things, but the second rules out any weakness.

    ReplyDelete

It's OK to comment on old posts.