Arcane Sentiment: September 2010

Is infix lambda hard to read?

I've been using infix lambda in pseudocode for a couple of years, with mixed results. I find it very easy to write, because it's so terse, and I tend to use it for any one-argument function I can't get by partial application and simple combinators. But I find it surprisingly hard to read.

The symptom is that I mistake the parameter list for an expression. For example, in a definition like this...

gravitate bodies = map (b → accelerate b (field b bodies)) bodies

...I sometimes read the first b as a free variable, and spend a confused moment trying to figure out where it's bound, before I notice the → operator and realize that I'm looking at the binding occurrence. This mistake is especially easy to make in larger definitions, where it's not obvious the b doesn't refer to some local variable I've forgotten about — and where it takes longer to see that no such variable exists.

I think the underlying problem is that the parameter list is the left argument of →, so when I read an expression left-to-right, I encounter it before the operator. Since I haven't yet seen any hint that it's not an expression, I try to read it as one, and fail. Ordinary lambda, in constrast, announces its specialness first, so I'm not surprised by the parameter list.

If this really is the problem, then it should also affect other operators whose left arguments aren't expressions. But there aren't many. The only one I can think of is the ubiquitous (in my pseudocode, at least) infix define, =, which may be less confusing because it's used in distinctive contexts (at top level or in a progn). There are a few common operators whose right arguments are special (e.g. . for field access), but it seems few infix operators have left arguments that aren't expressions.

What do C♯ users (at least those few who use lambda much) think? Do you have trouble reading =>?

Function of the day: sum

Most folds are sums — either (fold + 0 ...) or (fold + 0 (map f ...)). In math and pseudocode, they're written as Σ or sum, not as folds. So programming languages should support writing them the same way. In Clojure:

(defn sum "Add the elements of a collection, or some function of them."
  ([xs] (reduce + xs))
  ([f xs] (reduce (fn [a x] (+ a (f x))) 0 xs)))

This is one of those trivial conveniences that I use with surprising frequency — more than many more familiar operations. Counts from a 286-line Clojure program:

Count	Operation
11	`sum`
11	`#(...)` (`op` or abbreviated `λ`)
5	`format` (I/O is everywhere)
4	`fn` (λ)
4	`count`
3	`map`
3	`reduce` (counting the two to implement `sum`)
2	`filter`

sum may be trivial, but at least in this program, it's more common than map, reduce, and filter combined. Isn't that enough to deserve a place in the library?

(For performance reasons, Clojure's + isn't an ordinary generic function and thus can't be taught to work on vectors, so sum doesn't either. This program does vector arithmetic, so I had to duplicate various arithmetic operations for vectors, so three of these uses of sum were actually of a separate vsum. But this distinction would not be necessary in an ideal language.)

Pipe for functions

One of the minor annoyances of prefix notation for function call is that it gets the order of operations backwards. When you compose several functions into an expression, you generally have to write the last step first:

(handle-message (decrypt (receive socket) key))

If you explained this to a computer the same way you explain it to a human, you'd probably write the steps in the order they're performed. If you're used to Unix pipelines, you might write it like this:

receive socket | decrypt _ key | handle-message

In a language with user-defined infix operators, it's easy to support exactly this. You can define a | operator which simply applies its right argument to its left — the same as Haskell's $, but with the arguments reversed. It looks and feels very like the Unix |: it connects the output of one expression to the input of the next. The channels used are return value and arguments rather than stdout and stdin, but the effect is the same.

I can't remember where (edit 16 Feb 2011: here, at least, and also |> in F#), but I think I've heard this operator suggested before for Haskell — presumably with a different name, since | is taken. Ignoring that inconvenient detail, its Haskell definition is simple:

infixl 0 |
(|) :: a → (a → b) → b
x | f = f x
-- Or, to make the similarity to $ clearer:
(|) = flip ($)

Like $, this is a contemptibly trivial operator. All it does is apply one argument to the other, which doesn't sound like it could possibly be worth spending an infix operator on. But I find myself using it constantly in pseudocode, because it lets me write operations in the right order. It doesn't make code shorter, but it significantly reduces the distance between the code I write and the representation in my head. That's important enough to be worth a one-character infix operator.

Like any high-order operator, | is much more useful when you have a terse way to write simple functions. Usually this means partial application, either in the form of currying, or an explicit partial application operator, or op, or implicit op (as in the example above). Combinators are nice by themselves, but they need partial application to be really useful.

Shorter words for “expression”

Some common terms for programming languages vary widely in meaning, but at least one is understood across languages: virtually everyone agrees on “expression”.

Well, except in those communities that use the concept most. They have their own words for it. In Lisp, an expression is called a “form”; in the ML and statically-typed-functional-languages community, it's called a “term”. And let's not forget graph reduction, where it's called a “redex” (supposedly for “reducible expression”, although the “reducible” part is no longer relevant to the meaning). These terms all mean exactly the same thing, and all of the communities accept “expression” as a synonym, but they also have their own equivalents.

Private terms for common concepts usually exist to show group affiliation. That may be contributing here, but I think the main reason for replacing “expression” is brevity. It's difficult to write about programming languages (especially expression languages) without mentioning expressions exceedingly often, so there's a strong incentive to use a shorter word. “Expression” has three syllables and a difficult four-consonant cluster; “form” and “term” are easy one-syllable words. “Redex” is two syllables, which may explain why it's less popular than the other two.

None of the replacements are particularly transparent, but that doesn't matter much for such common words. (“Expression” doesn't make much sense either.) Apparently terms for such basic concepts needn't be obvious, as long as they're short.

Brevity may also be one of the reasons “thread” has largely replaced “process”. (The other reason is, of course, that popular OSes unify processes with other things like address spaces and security domains, and the word “process” has come to refer to the whole bundle rather than the original concept.)

Bursts of productivity are fun

I spent most of today slacking off at work, reading about astrophysics and paying only casual attention to actual work. But an hour or two past noon, a bug caught my eye — a simple mathematical matter of correctness, which I could feel good about fixing. It proved harder than I expected, and I soon filled a whiteboard with data structure diagrams and equations. After some frustration, I noticed other people leaving, and looked at a clock, which was obviously broken, since it said 5:00.

I turned back to the whiteboard, and saw that my previous approach was unnecessarily complicated. So I fixed that bug, and found another, and fixed it, and after fixing six bugs in a little over an hour, I lost interest and went back to reading.

Sound familiar? Most programmers (and most creative workers of any kind) are accustomed to seeing their productivity vary wildly over time. On one day you get nothing done and feel guilty for not trying very hard; the next day is the same except for a brief burst in which you seemingly do a week's work in two hours. The Internet is full of programmers lamenting this situation, and wishing they could have these bursts every day.

I suspect this unwelcome variation accounts for part of the fun of programming. The periods of apathy and frustration lower your expectations, so when the bursts of activity arrive, they seem far beyond your normal ability — a contrast that can make even the most boring problems exiting. I noticed this effect today: that one hour when everything went right, and everything worked on the first try, would not have been so impressive if it hadn't been preceded by a few hours of frustration. If you were modestly productive all the time, you might get the same total work done, but it would be more predictable, and you would have fewer moments of unexpected triumph. Mightn't that be less fun?