Doing without modify macros
Speaking of setf-like macros, there's something that bothers me about Common Lisp's define-modify-macro: most of its uses are very predictable. Alexandria, for example, contains at least five of them: minf, maxf, appendf, removef, and deletef. Except for argument order, all five are identical macroifications of standard functions. From their names, you can predict not only their meanings but their whole implementations.
This is not a good sign. Rather than trying to define a modify macro for each function, wouldn't it be better to just have one modify macro which takes a function as a parameter?
(defmacro modify (fn place &rest args &enviroment e)
(expand-setf (v r w) place e
`(let ((,v (,fn ,r ,@args)))
,w)))
So instead of (define-modify-macro foof (args ...) foo) and (foof place args ...), you'd write (modify foo place args ...). This could make most modify macros unnecessary, although it would need a shorter name, perhaps even one as short as !. However, its usefulness depends heavily on having functions available with the right argument order. If you have to fiddle with the function to fix its argument order, modify loses its brevity, and you might as well write out the setf.
Labels: common-lisp, macros, setf
Simplifying setf-like macros
Attention conservation notice: I sometimes post things that are of general interest to language designers, but this is not one of those times. If you don't speak Common Lisp, don't like macrology, or don't care about generalized assignment, this post is not for you.
As the rplacd&pop macro illustrates, defining new setf-like macros is quite awkward in Common Lisp. I'm not talking about defining new setf-able places, but about defining macros that operate on places, like push and swapf. They're so awkward to write that the Hyperspec doesn't directly define most of them, but instead resorts to simplified versions with disclaimers:
The effect of
(push item place)is equivalent to(setf place (cons item place))except that the subforms of place are evaluated only once, and item is evaluated before place.
There is actually a sample implementation of pop at get-setf-expansion, but it's not linked from pop, probably because it's too complex to be of much help in explaining what pop does. Handling places properly requires calling get-setf-expansion and doing nonobvious things with its five return values, which is a lot of trouble that has little to do with the desired macro. Consider the definition of rplacd&pop:
(defmacro rplacd&pop (x &optional y &environment e)
(multiple-value-bind (temps forms svars writeform readform)
(get-setf-expansion x e)
`(let* (,@(mapcar #'list temps forms)
(,(car svars) (cdr ,readform)))
(prog1 ,readform
(setf (cdr ,readform) ,y)
,writeform))))
There is a lot of predictable code here. This macro differs from pop in only two sections (italicized above), and from push in those two plus argument order. Most of the code is the same. Can we abstract it out with a macro?
(defmacro expand-setf ((valuevar readvar writevar) place env &body body)
"Generate code for a SETF-like operation: evaluate BODY with the three
vars bound to the corresponding parts of PLACE's SETF expansion, then
wrap the result in a LET to bind the SETF temporaries.
BODY should normally evaluate to a form that binds VALUEVAR to the
desired new value of PLACE and then does WRITEVAR."
(with-gensyms (temps forms storevars)
`(multiple-value-bind (,temps ,forms ,storevars ,writevar ,readvar)
(get-setf-expansion ,place ,env)
(let ((,valuevar (car ,storevars)))
`(let (,@(mapcar #'list ,temps ,forms))
,,@body)))))
I confess that I had some trouble writing this: I got confused about nested splicing. I'd feel bad about this, but I know everyone else has trouble with it too. Is there any other part of Lisp that so reliably earns its difficult reputation?
Anyway, having spent an hour writing expand-setf, I tried it out and was disappointed: it doesn't save as much code as I hoped. I forgot one of the basic rules of making difficult abstractions: write an example call before you write the macro.
(defmacro rplacd&pop (place &optional newcdr &environment e)
(expand-setf (val readform writeform) place e
`(let ((,val (cdr ,readform)))
(prog1 ,readform
(setf (cdr ,readform) ,newcdr)
,writeform)))
This is, unfortunately, only two lines shorter than the get-setf-expansion version, because expand-setf still requires the hassle of binding the store variable and passing the environment around. It scales nicely to more complicated macros, though:
(defmacro swapf (p1 p2 &environment e)
"Two-argument ROTATEF."
(expand-setf (v1 r1 w1) p1 e
(expand-setf (v2 r2 w2) p2 e
`(let ((,v1 ,r2) (,v2 ,r1))
,w1
,w2))))
We could get rid of the magical storevar by packaging writeform as a function, so you do (,writefn exp) instead of (let ((,storeform exp)) ,writeform). Unfortunately this doesn't necessarily make the callers any shorter, because they often have to store the value in a variable anyway:
(defmacro rplacd&pop (place &optional y &environment e)
(expand-setf (r w) place e
(with-gensyms (newval)
`(let ((,newval ,r))
(prog1 ,r
(rplacd (cdr ,r) ,y)
(,w ,newval)))))
It's also annoying that expand-setf wraps a form around the result of its body. That's a surprising thing for a macro to do, and I can imagine it confusing people and causing bugs. We can get rid of that - and of the whole macro - in a clean way, by packaging the expansion as a function:
(defun place-accessor (place env)
"Return a form evaluating to an accessor for PLACE: a function
that returns the value of PLACE when called with no arguments, and
sets PLACE when called with one argument."
(expand-setf (v r w env)
(with-gensyms (set?)
`(lambda (&optional (,v nil ,set?))
(if ,set? ;CASE-LAMBDA would be ideal here
,r
(progn ,w ,v))))))
This turns places into first-class values, which, not surprisingly, makes them easier to work with. (I think this has been done before to emulate pass-by-reference, but I've never seen it done to simplify writing other macros.) Distinguishing reads from writes by arity this way isn't entirely clean, but it beats packaging them as two separate functions, which would be prohibitively awkward for the caller. Here's how it works in practice:
(defmacro swapf (place1 place2 &environment e)
(with-gensyms (a1 a2 v)
`(let ((,a1 ,(place-accessor place1 e))
(,a2 ,(place-accessor place2 e)))
(let ((,v (funcall ,a1)))
(funcall ,a1 (funcall ,a2))
(funcall ,a2 ,v)))))
This is longer than the expand-setf version, because it doesn't handle the gensyms and variables itself. (Ignore the funcalls; they wouldn't be there in a Lisp-1.) It's less magical, but I'm not sure it's any easier to read. place-accessor would be best used as the foundation of a setf system, since it's conceptually simpler than get-setf-expansion, but it still needs a with-place-accessor wrapper macro. And I've lost sight of the original problem. None of these solutions make defining rplacd&pop and push as simple as it ought to be. There must be a better way.
Maybe I'm coming at this from the wrong direction. Rather than simplify the general case, what about generalizing the simple case? define-modify-macro already covers the most common kind of one-place macros: those that simply apply some function to the value in the place, and store (and return) the result:
(setf place (some-function place args ...))
For example:
CL-USER> (define-modify-macro pop* () cdr
"Like POP, but doesn't return the popped value.")
POP*
CL-USER> (macroexpand-1 '(pop* place))
(LET* ((#:G4467 (CDR PLACE)))
(SETQ PLACE #:G4467))
The only reason define-modify-macro can't express pop and rplacd&pop is that the macros it defines must return the same value they store. We could fix this by adding another argument: in addition to a function to determine the stored value, we can supply one to determine the returned value:
(defmacro define-modify-macro* (name arglist transformfn resultfn
&optional docstring)
"Like DEFINE-MODIFY-MACRO, but takes an additional function, to
define a macro that returns a different value than it stores.
Subforms of the place are evaluated first, then any additional arguments,
then TRANSFORMFM is called, then RESULTFN, then the value is stored, and
finally the result of RESULTFN is returned.
ARGLIST may contain &optional arguments but not &rest or &key."
(let* ((args (arg-names arglist))
(gensyms (mapcar (lambda (a) (gensym (symbol-name a))) args)))
;GENSYMS is a list of names (in the intermediate expansion) of
; names (in the final expansion) of the values of the ARGS. They
; have to be gensymmed to protect TRANSFORMFM and RESULTFN.
(with-gensyms (place env)
`(defmacro ,name (,place ,@arglist &environment ,env)
,@(when docstring (list docstring))
(expand-setf (v r w) ,place ,env
(with-gensyms ,gensyms
`(let (,,@(mapcar (lambda (gen arg) ``(,,gen ,,arg))
gensyms args))
(let ((,v (,',transformfn ,r ,,@gensyms)))
(prog1 (,',resultfn ,r ,,@gensyms)
,w)))))))))
(defun arg-names (arglist)
"Return the names of the arguments in a lambda-list."
(loop for arg in arglist
unless (member arg lambda-list-keywords)
collect (if (consp arg) (car arg) arg)))
(I had trouble writing this macro too - not just the multiple levels of splicing and gensyms, but also the awkwardness of extracting the argument names. Common Lisp has complex argument-lists, but no tools to parse them, so it took me a few tries before I realized I needed to do it myself, and then a few more before I gave up trying to handle &rest and &key. And I almost forgot to prevent multiple evaluation of argument forms!)
With this macro, pop-like macros can be defined more conveniently:
(define-modify-macro* rplacd&pop (&optional newcdr)
(lambda (x y) (declare (ignore y)) (cdr x))
rplacd
"I am not explaining this one again.")
Except for the hassle of lambda and ignore to give cdr the right argument, this is not too painful. push requires a wrapper to fix its argument order, but pop is easy:
(define-modify-macro %push (val)
(lambda (list val) (cons val list))
(lambda (list val) (declare (ignore list)) val)
"Helper for PUSH")
(defmacro push (val place)
`(%push ,place ,val))
(define-modify-macro* pop () cdr car)
Correction: push returns the same value it stores, so it doesn't require define-modify-macro*.
I started this post intending to learn something about how to design a better setf-like generalized assignment system, and succeeded only in writing a few macros to make dealing with Common Lisp's setf less painful. This is, I suppose, still useful, but it's disappointing.
Labels: common-lisp, macros, setf
When the functional way is much worse
One of the strengths of functional (i.e. applicative) programming style is that it makes dataflow obvious. Usually this is a good thing, but there are many cases where parts of a program's dataflow are best kept hidden. For instance, some functions return a value (or several) not because their caller is ever interested in it, but so the caller can pass it to somewhere else that actually is interested. This often happens when you're keeping some sort of profile information. For example, suppose you're playing with the very recursive Sudan function:
(defun sudan (n x y) "The first known function that is recursive but not primitive recursive." (cond ((zerop n) (+ x y)) ((zerop y) x) (t (let ((s (sudan n x (1- y)))) (sudan (1- n) s (+ s y))))))CL-USER>(sudan 1 2 3)27 CL-USER>(sudan 3 2 1)Control stack exhausted (no more space for function call frames). [Condition of type SB-KERNEL::CONTROL-STACK-EXHAUSTED] CL-USER>(sudan 2 2 1)27 CL-USER>(sudan 2 2 2)15569256417 CL-USER>(sudan 2 2 3)Control stack exhausted (no more space for function call frames). [Condition of type SB-KERNEL::CONTROL-STACK-EXHAUSTED] CL-USER>(sudan 2 3 2)5742397643169488579854258
It returns instantly for very small arguments, and overflows for slightly larger ones. What's going on?
The obvious way to find out is to instrument it to count how many times it was called. (Yes, I know there are tools for this, but this is an example of a general problem, of which most cases don't have tool support.) The proper, functional way is to return the number of calls as an additional value:
(defun sudan/count (n x y) "The first known function that is recursive but not primitive recursive. Returns two values: S_n(x,y), and the number of calls required to compute it." (cond ((zerop n) (values (+ x y) 1)) ((zerop y) (values x 1)) (t (multiple-value-bind (s calls) (sudan/count n x (1- y)) (multiple-value-bind (s2 calls2) (sudan/count (1- n) s (+ s y)) (values s2 (+ calls calls2 1)))))))CL-USER>(sudan3 2 2 2)15569256417 69 CL-USER>(sudan/count 2 3 2)5742397643169488579854258 165
Hmm. That's not so bad. The number of calls is not so bad, I mean. The code, though, is a mess. The relationship of the recursive calls is obscured with multiple-value-binds, and every return site has to be wrapped with values, and if you miss one the function won't work. This is not a good way to instrument a function. More precisely, this is not a good way to pass profiling data up the call tree. But is there a better way?
If I were in a stateful mood, I would have done it this way:
(defun sudan/count2 (n x y)
"The first known function that is recursive but not primitive recursive.
Returns two values: S_n(x,y), and the number of calls required to compute it."
(let ((calls 0))
(labels ((sudan (n x y)
(incf calls)
(cond ((zerop n) (+ x y))
((zerop y) x)
(t (let ((s (sudan n x (1- y))))
(sudan (1- n) s (+ s y)))))))
(values (sudan n x y) calls))))
That's one line longer than the multivalue version, but shorter in characters or nodes - and much easier to maintain, because the profiling is not tangled through the rest of the function. It also scales much better, because it doesn't require such pervasive changes. The trouble with state - well, one of the troubles with state - is that it hides dataflow, but in this case hiding dataflow is exactly what we want.
By the way, I had a predictable bug in the first version of sudan/count: when I renamed the function, I forgot to change the recursive calls, so it called the old version. Recursing by name is redundant in a bad way - it depends on the function's name, so merely renaming the function changes its behaviour! This wouldn't have happened in Forth, which has a recurse operation that doesn't depend on the function's name. Anonymous recursion is inflexible, of course, but there are many simple cases like this, where it more closely expresses what you mean, and avoids a possible bug.
Speaking of possible bugs: If you noticed the evaluation order dependency in (values (sudan n x y) calls), good for you. I didn't.
How toy problems are different
A lot of simple text-processing problems involve operating on the words of the input. Norvig's spellchecker, for instance, begins by extracting the words:
def words(text): return re.findall('[a-z]+', text.lower())
So do some of the old Shootout problems (though none of the currently active ones, and they don't keep old ones around, so it you want to see an example you'll have to use the WayBackMachine). This sets my library-senses tingling. If so many programs need to split text into words, shouldn't words be in the standard library?
There are two problems here. The lesser is that the definition of "word" varies a bit - sometimes it's defined by alphanumeric characters, sometimes by whitespace. This isn't a showstopper, because some reasonable definition will probably work for a large fraction of the potential uses. The greater problem is that splitting things into words is much less common in real programs than in toy examples. Words are like Fibonacci numbers: they're familiar and easy to ask for, so they occur unusually often in made-up requirements. Neither is common in real programs - probably not enough so to warrant including them in a language's library.
This is one of the pitfalls of testing a language against toy problems. Unlike real problems, they're strongly biased toward simplicity of requirements, which affects what they demand from a language. Real programs tend to spend a lot of code on I/O and error handling, neither of which turns up much in toy problems. Is it a coincidence that these two areas are awkward to express in most languages?
Where have all the pointers gone?
Dynamically typed languages represent almost everything as tagged pointers, which sounds very inefficient: doesn't that waste a lot of space? And doesn't 64-bit addressing make this waste much, much worse?
I used to take this seriously. But one day a few years ago I was playing with Squeak, and wondering why its images (savable worlds) were so large - 10 MB, when the original Smalltalk ran in 64 KB. I stumbled across allObjectsDo:, which iterates across all objects in the heap, so I wrote a heap profiler to find out where the space was going. Once I finished it, I discovered Squeak came with a much better one, so I used that instead. Here are all the classes that take more than 1% of the space:
| Class | Purpose | # Instances | Space | % |
|---|---|---|---|---|
CompiledMethod | bytecode | 33951 | 2.0 MB | 20% |
Bitmap | graphics | 1208 | 1.8 MB | 17% |
String | text | 28743 | 1.7 MB | 16% |
Array | various | 30263 | 1.3 MB | 12% |
ByteArray | various | 379 | 800 KB | 8% |
Symbol | selectors mostly | 27786 | 550 KB | 5% |
MethodDictionary | method dispatch | 3092 | 320 KB | 3% |
WeakArray | symbol table mostly | 15 | 230 KB | 2.2% |
Point | GUI | 17193 | 200 KB | 1.9% |
MorphExtension | GUI | 3344 | 170 KB | 1.6% |
Color | GUI | 8323 | 163 KB | 1.6% |
Association | hashtables | 10761 | 126 KB | 1.2% |
Float | boxed float | 9732 | 114 KB | 1.1% |
| All others | 1.1 MB | 10% | ||
| Classes without pointers | >6.6 MB | >65% | ||
| Total | 10.2 MB | 100% | ||
This shows why tagged pointers are not a big space problem: there just aren't that many of them. Five of the top six classes, accounting for two thirds of the space, are binary data. (The exception is Array.) Dynamically typed languages, it turns out, don't encode most data as tagged pointers.
Well, of course. What do you do with a large data structure? You find a more compact representation. Often that means replacing low-entropy pointers with high-entropy bytes. That's what happened to all of those five non-pointer classes. Dynamic typing makes no difference: as long as you have some sort of byte arrays (in Smalltalk, Class»variableByteSubclass:), you can do this in dynamically typed languages the same way you do it in statically typed ones. If tagged pointers are wasteful, the waste is in boxing small objects like floats, not in the presence of pointers.
Of course this is only one image, and an odd one at that - instead of application-specific data, it's dominated by library code. The reason Squeak is so much bigger than the old Smalltalks is that it has a much bigger standard library - and no autoloading, so it's all in the image all the time. Code accounts for at least 30% of the space: CompiledMethod, Symbol, MethodDictionary, and WeakArray (most of which is the symbol table). This 3 MB would be insignificant in a bigger image, but otherwise I expect larger applications to have space profiles much like Squeak's. Most classes are made of pointers, but most of the space goes on the few that aren't.
A hazard of precedence
I encountered a bug recently from a precedence mistake. Someone had written if (a = b == c), intending if ((a = b) == c) - but in C == has higher precedence than =, so that wasn't how the parser saw it. The typechecker didn't object, because the result type of == in C is int, so all was apparently well. None of the humans noticed either, perhaps because b was a large expression, so the two infix operators were far apart.
Operator precedence saves lots of parentheses, but it does occasionally lead to bugs. It's one of the class of language features that work by resolving ambiguity, and these have a common hazard: when you don't notice the ambiguity, it will still be resolved, and not necessarily as you intend.
Labels: syntax
Short names are addictive
This morning I tried to do this in Common Lisp:
(defun neighbors (x y)
(mapcar (fn (dx dy) (vector (+ x dx) (+ y dy)))
'(-1 0 1 0)
'(0 1 0 -1)))
Naturally, SBCL spit out half a page of warnings, starting with this:
; (+ X DX)
;
; caught WARNING:
; undefined variable: DX
; (DX DY)
;
; caught STYLE-WARNING:
; undefined function: DX
;
; caught WARNING:
; undefined variable: DY
It took me an embarassingly long time to figure out why. I have become so used to fn being lambda - in Arc, Clojure, and any number of other new Lisps, including my own - that I didn't notice anything out of the ordinary.
I did at least remember that CL can't add vectors, which was how I wanted to write it. This is a hazard of learning new languages, and of designing them: improvements, even such minor ones as shorter names, are addictive. They're fun at first, but soon you become dependent on them, and the old ways seem unbearably clunky, if you can even remember them.
Labels: names