That's not what lexical scope is for

In one of Joe Marshall's posts on ways to accidentally break an abstraction, there are several top-level definitions which close over a lexical variable, like this:

(define lib-mapcar 
  (let ((calls-to-f 0))
    (add-counter! 'calls-to-f (lambda () calls-to-f))
    (lambda (f list)
      ...)))

I know this isn't supposed to be an example of good style, but I wince to see functions defined this way. lib-mapcar's argument list is several lines away from its name, hidden behind some unrelated profiling code and a lambda. This sort of thing is distressingly common in Scheme, and is even presented as good style sometimes. The usual justification is to keep the variable private, so you don't have to worry about name collisions or unexpected dependencies. But the cost in clarity far exceeds the tiny benefit in privacy.

I guess this is what happens when you don't have (or aren't accustomed to having) a module system. You start to think of top-level defines as "globals", to be avoided whenever possible, and look for some alternative. Some languages provide convenient alternatives, such as C's static (in all three of its senses), but in Scheme the only obvious alternative is to abuse lexical scope. And since everyone knows lexical scope is an unequivocal Good Thing, it's easy to overlook how awkward it is as an access-control tool.

I'm not the only one with this problem

Arnold Zwicky, of the famous Language Log, describes his blogging habits:

Meanwhile, proto-postings pile up. I'd estimate that I have about 800 partially completed postings, but though I used to keep some inventories of these, there are now so many different inventories in so many different places that I can't keep track of them.

That's exactly my situation, except I only have about 350.

Overloading in math

A few months ago Joe Marshall did a series of posts about how computer science has affected his view of math, beginning with its surpising lack of rigor: unlike programming languages, math notation is often ambiguous.

Check out these two equations:


By the first equation, Q is obviously some sort of function (the quantile function). H therefore operates on a function. By the second equation, the argument to H, which is called x, is used as the argument to f, which is the probability distribution function. f maps numbers to numbers, so x is a number. The types don't match.

This type error is precisely why this notation works. We can write Q to mean both a function and its result (or, to put it differently, both a quantity and a function relating that quantity to something else) because the types differentiate them. It's just like overloading in programming languages, but it's more pervasive, and it's not declared in advance, and one of the overloaded variables isn't even a function.

Indeed, if both Qs were functions, or even if Q returned another function instead of a number, this overloading wouldn't work. It would be too easy to confuse the two. But since their types are so different, we can safely overload arbitrary variables with the functions that compute them, and only programmers will be confused.

Programmers are paranoid about ambiguous notation, because computers can't be trusted to resolve it right. Mathematicians are much less so, because humans handle it better.

I think we can learn something from the mathematicians here. Language designers sometimes reject new overloadings for fear they'd be confusing, even if they wouldn't pose a problem for compilers. For example, overloading arithmetic operations to work on collections or functions sounds scary — wouldn't it be hard to tell what was being added and what was just being mapped or composed? But similarly wide-ranging overloadings in mathematical notation don't cause much trouble, and indeed these specific overloadings exist and work fine in APL and J. When a new construct looks confusing to humans but not to machines, that may be a sign that it's just unfamiliar.