Arcane Sentiment: Does infix notation have better constituent order?

Everyone knows the usual argument for infix notation: that everyone's already used to it. That's a reasonable point, but it's not very satisfying — it's arbitrary, and it's an advantage of conventional notation, not of infix per se. But there are some real advantages of infix: it gets arguments closer to the operator than {pre,post}fix, and it can save tokens by using lower-precedence operators as punctuation. There are also some advantages related to argument order, which deserve more attention than they usually get.

Consider natural languages. They frequently work in topic-comment order: you mention some topic (a noun phrase), then make some comment about it. (I just did that, by saying “consider natural languages”, and then saying something about them.) Some languages, such as Japanese, even have special grammatical features for this. Topic-comment order is useful because (among other reasons) the topic provides context for understanding the comment. This applies in programming languages too. I suspect it's one of the attractions of writing message sends as receiver.m(a, b): the receiver is often the topic, so it's convenient to write it first and separately. Unfortunately, topic-comment order is impractical in prefix notation, because the topic needs to be first. But it works fine in infix or postfix.

Natural languages also prefer to put large constituents last (or, failing that, first). This is for a simple practical reason: so you don't have to remember as much grammatical state across a large subtree. It works in code too: it's easier to read (map args (lambda (x) (do lots of stuff with x))) than (map (lambda (x) (do lots of stuff with x)) args). By the way, this is a common complaint about map — e.g. David Rush says:

I also find it awkward to have the function argument be the first argument to map. It is nearly always the most complex value to express in the call and the length of the expression tends to obscure the connection with the collection arguments.

It's also easier to format a form when the large constituent is last — think of the awkwardness of indenting a function call that has several short arguments after a big lambda. Infix doesn't make indenting easy, but it does avoid large arguments in the wrong place — it fills the difficult middle position with the small, easy-to-parse operator.

I know other people have discussed both of these argument-order effects in general before, but my search-fu is failing me. Larry Wall's article on natural-language features in Perl mentions topicalization but not right-heaviness, surprisingly.

The common thread here is that the first and last positions of a form are special: certain arguments often want to be there. The middle position is the only place the operator can go without displacing an argument from its preferred place. This is a nice story, but it has a little flaw: operators also want to be in the end positions. They want to be first to provide context for the arguments (especially for macros), and to make the tree structure more transparent; they want to be last to reflect execution order (in eager languages). So putting the operator in the middle is not an unmitigated win. But it may be enough of an advantage to explain why, even to people used to prefix or postfix, infix sometimes feels more comfortably like natural language.

Arcane Sentiment

Does infix notation have better constituent order?

No comments:

Post a Comment