Flattening list construction

Most beginning programmers expect lists (and other sequences) to behave something like this:

a = [1, 2, 3]
b = [4]
[a, b] => [1, 2, 3, 4]
[a, 5, b, 6] => [1, 2, 3, 5, 4, 6]

They expect the list construction operator to catenate lists, but not non-lists. As long as lists can't contain other lists (which rarely occurs to beginners), this is a well-behaved operator. Languages which don't allow lists of lists often have it as their default list construction operator: Perl, APL, and R, for example.

Languages which do allow lists of lists generally reject flattening construction on the grounds that it's unpredictable. Which is true — it's easy to write list functions using it that will work perfectly until they encounter nested lists, whereupon they quietly, mysteriously flatten them.

The trouble is, it's a very convenient operator. It's common (especially at the REPL) to construct a list ad hoc, out of some preexisting pieces, and when you do, you generally don't care which elements came prepackaged in lists and which were solitary. That's irrelevant; you just want to assemble them all into a list without worrying about where they came from.

For example, recently I was trying to control the undead menace with Clojure, and wrote this:

(def units (concat (for [y (range 1 20 3)]
                     (spawn zombie 1 y))
                   (list (spawn pixie 10 10)
                         (spawn pixie 15 10))))

I nearly forgot to include that list, because it was irrelevant detail. What I really wanted to write was something like this:

(def units (enlist (for [y (range 1 20 3)]
                     (spawn zombie 1 y))
                   (spawn pixie 10 10)
                   (spawn pixie 15 10)))

In a dynamically typed language, it's easy to define enlist (so I won't bother), but most don't, because it can't be made to work consistently for all inputs. I think this is a loss, because it's so very convenient when it does work.

2 comments:

  1. I actually find this behavior more mysterious in languages that flatten tuples of length 1 into their contents, as many do. I think this is partly a matter of notation: in Python, for example, the expressions of an n-tuple is are terminated by commas, with the rule that if there is more than one comma, the last can be dropped. Thus () is the 0-tuple, (1,) is a 1-tuple, (1, 2,) and (1, 2) are equivalent 2-tuples, and so on.

    In statically typed languages, where 0-tuples, 1-tuples, 2-tuples and so on are completely distinct type families, this isn't such a big problem. But in a dynamically typed language, a function that returns a tuple of arbitrary tuples gives strange results if some of the embedded tuples are canonicalized to simple expressions.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete

It's OK to comment on old posts.