Macro of the day: build-list

Some operations are rare in “real” programs but common in throwaways and toys. One of these is creating lists of random data. Toy programs frequently use random input, simply because it's easier than getting real data. Usually this takes the form of evaluating some (random-thing) expression over and over and returning a list of the results:

(map (λ (x) (random-thing)) (range n))

The unused x here offends me. The randomly generated things usually don't depend on where they end up in the list, so why should the function that generates them receive the index as an argument?

I only find the unused argument annoying if it's in an explicit λ. Doing the same thing with Clojure's for doesn't feel nearly as sinful, since it doesn't explicitly say the index is being passed to a function:

(for [x (range n)] (random-thing))

That's acceptably simple. But what I really want to write is something like this:

(build-list n (random-thing))

...which evaluates (random-thing) n times and returns the results as a list. This is a straightforward macro:

(defmacro build-list
  "Return a seq of the results of evaluating EXPR N times."
  [n expr]
  `(for [i# (range ~n)] ~expr))

Or, in CL (which, even using loop, doesn't look good in comparison):

(defmacro build-list (n body)
  (with-gensyms (i)
    `(loop for ,i from 1 to ,n collect ,body)))

Clojure's built-in repeatedly is the functional equivalent of build-list: it's map over range, but without passing the argument. With # as a one-character λ, this approaches the ideal:

(repeatedly n #(random-thing))

These tools are simple and frequently useful, especially in the throwaway programs that most benefit from library support, but few standard libraries offer them. Some come close, but can't resist including the index as an argument. Racket, for instance, has a build-list function which is equivalent to map over a range, but it's barely an improvement for this use:

(build-list n (λ (x) (random-thing)))

I suspect this is because of cultural dissonance. Calling something repeatedly without passing it an argument is obviously useful only if it's not purely functional, and functional programmers tend to be suspicious of that. We (well, except for the Haskellers) accept impurities like randomness, but not so enthusiastically that we're comfortable including additional library functions to help with them. We expect our libraries to do the Right Thing, and some of our trusted heuristics tell us that making lists of random data is not the Right Thing. So, useful or not, we don't really try to make it easy.

6 comments:

  1. CL with loop would be: (loop repeat n collect ...)

    ReplyDelete
  2. Oh, another useful LOOP clause I didn't know. :/ (Scandal: knee-jerk anti-LOOPist doesn't know LOOP!)

    I originally used DO instead, but that was just too verbose.

    ReplyDelete
  3. Maybe this is cultural too (I'm not a CL or Clojure coder -- I tend more towards Scheme and Haskell), but on what basis do you call this sort of thing "rare" in production programs? I'm thinking of fuzz testing, here. My first inclination was to say "This should take a seed so its results can be repeated", which leads to it being a fold.

    ReplyDelete
  4. I think it's rare because I've never wanted such a list in "real" code, and I do regularly for toys. But I forgot completely about random tests (despite having written one recently) — I suppose random lists would be used for that.

    A seed is potentially useful, but (at least in this sort of application) will probably never actually be needed, so why bother? Also, the CL-native way of threading RNG state is by side effect rather than explicit fold, since this doesn't require reorganizing the code that uses it.

    ReplyDelete
  5. (repeatedly n random-thing) is sufficient - no need for #().

    ReplyDelete
  6. Unknown: (random-thing) is supposed to be a standin for some larger expression, not necessarily a function.

    ReplyDelete

It's OK to comment on old posts.