Flattening list construction

Most beginning programmers expect lists (and other sequences) to behave something like this:

a = [1, 2, 3]
b = [4]
[a, b] => [1, 2, 3, 4]
[a, 5, b, 6] => [1, 2, 3, 5, 4, 6]

They expect the list construction operator to catenate lists, but not non-lists. As long as lists can't contain other lists (which rarely occurs to beginners), this is a well-behaved operator. Languages which don't allow lists of lists often have it as their default list construction operator: Perl, APL, and R, for example.

Languages which do allow lists of lists generally reject flattening construction on the grounds that it's unpredictable. Which is true — it's easy to write list functions using it that will work perfectly until they encounter nested lists, whereupon they quietly, mysteriously flatten them.

The trouble is, it's a very convenient operator. It's common (especially at the REPL) to construct a list ad hoc, out of some preexisting pieces, and when you do, you generally don't care which elements came prepackaged in lists and which were solitary. That's irrelevant; you just want to assemble them all into a list without worrying about where they came from.

For example, recently I was trying to control the undead menace with Clojure, and wrote this:

(def units (concat (for [y (range 1 20 3)]
                     (spawn zombie 1 y))
                   (list (spawn pixie 10 10)
                         (spawn pixie 15 10))))

I nearly forgot to include that list, because it was irrelevant detail. What I really wanted to write was something like this:

(def units (enlist (for [y (range 1 20 3)]
                     (spawn zombie 1 y))
                   (spawn pixie 10 10)
                   (spawn pixie 15 10)))

In a dynamically typed language, it's easy to define enlist (so I won't bother), but most don't, because it can't be made to work consistently for all inputs. I think this is a loss, because it's so very convenient when it does work.

1 comment:

  1. I actually find this behavior more mysterious in languages that flatten tuples of length 1 into their contents, as many do. I think this is partly a matter of notation: in Python, for example, the expressions of an n-tuple is are terminated by commas, with the rule that if there is more than one comma, the last can be dropped. Thus () is the 0-tuple, (1,) is a 1-tuple, (1, 2,) and (1, 2) are equivalent 2-tuples, and so on.

    In statically typed languages, where 0-tuples, 1-tuples, 2-tuples and so on are completely distinct type families, this isn't such a big problem. But in a dynamically typed language, a function that returns a tuple of arbitrary tuples gives strange results if some of the embedded tuples are canonicalized to simple expressions.


It's OK to comment on old posts.