Eliminate unsightly warnings!

Compiler warnings getting you down? People complaining about all the “weird garbage” printed when they type make? Fear not — there's a solution!

A while ago I had occasion to build a particular (over)complicated project that had recently been ported from Windows to Unix. The Windows version spewed pages of warnings when building, but on Unix there was none of that — just an empty terminal with a helpful success message.

Wait a minute...

At the end of the makefile's default rule, I found this:

        clear
        echo Done building $(TARGET)!

I hope you wouldn't consider doing anything like that. Now, if you'll excuse me, I have so many warnings to look at...

Errors

About signaling conditions in Common Lisp, Nikodemus Siivola writes:

Use #'ERROR to signal conditions inheriting from SERIOUS-CONDITION. This includes all subclasses or ERROR. SERIOUS-CONDITION is a convention that means "this will land you in the debugger if not handled", which is what #'ERROR ensures.

  • Inherit from ERROR if unwinding from the error leaves the system in a consistent state. Subclasses of ERROR mean "oops, we have a problem, but it's OK to unwind."
  • Inherit from SERIOUS-CONDITION, not ERROR, if the system is messed up somehow, and/or requires either human intervention or special knowledge about the situation to resolve: SERIOUS-CONDITIONs that are not ERRORs mean that "something is deeply wrong and you should know what you are doing if you are going to resolve this."

There are two problems with this interpretation.

First, it gives meaning to a difference type: (and serious-condition (not error)) means something beyond what serious-condition alone means. You could interpret this as “serious-condition means we can't continue; error means we can exit safely” - but non-serious-condition signals also mean we can exit safely. So the conditions that can be handled safely are (and condition (or (not serious-condition) error)). This complex type is at least a warning sign. If you want to distinguish recoverable from unrecoverable conditions, shouldn't one or the other have its own class?

Second, it's not what the Hyperspec says. It defines an error as

a situation in which the semantics of a program are not specified, and in which the consequences are undefined

and a serious condition as

a situation that is generally sufficiently severe that entry into the debugger should be expected if the condition is signaled but not handled.

These aren't the most lucid definitions, but the idea is that serious-condition means the signal can't be ignored, and error means this is because the semantics are undefined. (and serious-condition (not error)), therefore, is a situation where the semantics are defined, but where execution nonetheless cannot continue for some other reason - running out of memory, for instance. The distinction isn't between well-behaved states and messed-up ones, but between semantic nonsense and other failures. C++ has an analogous distinction between logic_error, which is the same as CL's error, and runtime_error, which covers the rest of CL's serious-condition. (C++ has only terminating exceptions, so it doesn't have non-serious conditions.)

I'm tempted to make more such distinctions, giving something like this:

  • Nonsense: the program has asked for something undefined, like (+ '(hello) "world!"). (CL error, C++ logic_error) You can define a reasonable meaning for most nonsense errors; they are errors only because their meaning hasn't been defined yet.
  • Limitation: the implementation knows what you mean, but can't deliver, e.g. arithmetic overflow or running out of memory. (RNRS calls this a “violation of an implementation restriction”.) You could consider “not yet implemented” to be a limitation as well.
  • Environmental failure: the operation failed because the outside world did not cooperate, e.g. a file was not found or a connection was refused.
  • Forbidden: operations which do have an well-defined meaning, but you're not allowed to do them, usually for security or safety reasons, e.g. you don't have permission to unlink("/").

In other words, “I don't understand”, “I can't”, “It's not working”, and “I'm sorry, Dave. I'm afraid I can't do that.”.

This is a nice taxonomy, but I'm not sure it's actually useful. The problem is that one error often leads to another, and the error that's detected isn't necessarily the original problem. For instance, failure to open a nonexistent file (an environmental failure) may manifest as a nonsensical attempt to read from an invalid stream. (CL actually signals it as an error, even though it's not a semantic error, perhaps for this reason.) Or a fixed-size buffer (a limitation) causes a buffer overflow (nonsense) which is eventually detected as a segfault (a forbidden operation). Why try to distinguish between classes of errors when they're so fluid?

Fortunately, programs don't usually care why they failed, unless it was a certain specific error they were expecting. IME most exception handlers are either quite broad (e.g. catching serious-condition near the root of a large computation), and recover in generic ways such as aborting, or quite specific (e.g. catching file-not-found-error on one particular open), and attempt to deal with the problem. There aren't many handlers that try to generalize at intermediate levels, even in languages that support them.

So maybe it's not so useful to classify exceptions by their cause. Maybe we should instead classify them by the possible responses: things like whether they can be ignored (CL's serious-condition) and whether they can be handled at all. In that case Nikodemus' interpretation of (and serious_condition (not error)) as “unrecoverable error” is right — not for CL, but for some future language.

Ambivalent about anaphors

I like anaphoric macros like aif and aand, but that a- is rather opaque. If you didn't know what it stood for, would you ever guess "anaphoric", or have a clue what the macros did?

Abstruse naming conventions are nothing new — how long have Lispers been using -p for predicates? — but that's no reason to inflict more of them. Is there a less exotic affix for anaphoric macros? Probably not, because it has to be short to be useful, and anything that short will be pretty opaque.

In any case, I'm not sure separate anaphoric macros are the right idea. It's nice to be able to bind a variable by adding a single character to a name, but it's not general. It won't work with a newly defined macro until someone defines an anaphoric variant, which realistically won't happen very often. Maybe that doesn't matter, since you can always fall back on let.

I wonder if it wouldn't be better to be like Perl, and make everything anaphoric. If the normal versions of if and progn bound a variable (perhaps with a more distinctive name than it?), then new macros based on them would too. This sounds nightmarishly unpredictable, since a newly defined macro might use several anaphoric constructs, and the innermost is not necessarily the one you want. It works for and but not for (the naive definition of) case, for instance.

Rampant anaphora might also pose a maintenance hazard, since simple transformations of code may change their referents. However, I haven't heard that this is a big problem in Perl. I suspect this is because people normally base their changes on a mental model of local dataflow, and in Perl, this model includes $_. Unfortunately I don't have enough Perl experience to check this on myself.

I would like to dismiss anaphora on the grounds that they don't compose well: while one aif can make an expression much simpler, a second will often conflict with the first. But this isn't a very distinctive fault. Many useful constructs are hard to compose — even function composition, ironically — and anaphoric macros are too handy to ignore.

Identity of numbers in Common Lisp and Maclisp

Nick Levine asks:

If anyone knows of a Common Lisp implementation for which (eq 3 3) need not be true, could they get back to me? I know what the Spec says about this but have difficulty believing it.

Commenters duly reply that ABCL and some versions of GCL do sometimes have multiple fixnums with the same value. Basically, this can happen in any implementation that uses boxed fixnums and doesn't intern them.

But this is not the only thing that can happen to fixnum identity. Common Lisp allows much worse:

An implementation is permitted to make "copies" of characters and numbers at any time.

This is an ugly thing to have in a language with otherwise straightforward object semantics. CLtL explains:

In general, numbers in Common Lisp are not true objects; eq cannot be counted upon to operate on them reliably. In particular, it is possible that the expression

(let ((x z) (y z)) (eq x y))

may be false rather than true if the value of z is a number.

Rationale: This odd breakdown of eq in the case of numbers allows the implementor enough design freedom to produce exceptionally efficient numerical code on conventional architectures. MacLisp requires this freedom, for example, in order to produce compiled numerical code equal in speed to Fortran. Common Lisp makes this same restriction, if not for this freedom, then at least for the sake of compatibility.

If I understand correctly, the rationale refers to how PDP-10 Maclisp handled arithmetic. It used boxed numbers, and to reduce allocation it often kept them on temporary stacks. This created a problem when a stack-allocated number had to escape and become a permanent value:

Some operations, such as storing a number in a special variable, or RPLAC'ing it into a cons, clearly require permanent allocation of the number. However, in order to allow allocation-free numeric computation, PDP-10 Lisp allows the passing of "intermediate" numeric results between functions without allocating them. Because of this, a function receiving any argument which cannot be positively verified at compile time as not being a pointer to a "temporary" number ("PDL number") must be extremely circumspect about what it does with that argument, in fact, calling a runtime routine named PDLNMK whenever a result of "unknown parentage" is stored into a permanent place. The routine checks to see if the result being stored is in fact a pointer to a PDL number, and allocates it permanently if so.

Steele's Fast Arithmetic in MacLISP provides more detail: NCOMPLR reallocated numbers whenever they escaped and had to become permanent, so it sometimes allocated a single value more than once, producing multiple non-eq copies.

The same problem arises with deboxing optimizations in other compilers. Most deboxing compilers, however, are careful to avoid multiple reboxing, for performance reasons: they don't want to embarrass themselves by accidentally allocating even more numbers than they would with naive boxed arithmetic. This has the convenient side effect of preserving identity of numbers.

So while Common Lisp allows implementations to copy numbers at will, I suspect this is purely a historical relic, “rooted in NCOMPLR” rather than in any practical need. Has there ever been an implementation that exploited it?

Function of the day: lerp

In response to clamp, a commenter unerringly pointed out another function whose common name I didn't know: lerp, for linear interpolation. You know the operation - you have two points and you want a line:

lerp :: (Num a, Num b) ⇒ (a, b) → (a, b) → a → b
lerp (x1,y1) (x2,y2) x = y1 + (x - x1) * (y2 - y1) / (x2 - x1)

? foo = lerp (1,1) (10,4)
? foo 4
2
? foo -5
-1

What if you have more than two points? The obvious generalization is to interpolate between successive pairs of points:

polylerp :: (Num a, Num b) ⇒ [(a, b)] → a → b
polylerp [] _                              = error "domain"
polylerp ((x1, _):ps) x          | x1 > x  = error "domain"
polylerp ((x1, y1):ps) x         | x1 == x = y1
polylerp ((x1, y1):(x2, y2):_) x | x2 > x  = lerp (x1, y1) (x2, y2) x
polylerp (_:ps) x                          = polylerp ps x

(Apologies for the explicit recursion. I wanted to write this with combinators to more closely resemble the natural-language description, but even with made-to-order combinators, I can't find anything clearer. This sort of iteration - a fold that operates on more than one element at once - is often awkward to express.)

The commenter mentioned that lerp, like clamp, is used a lot in graphics. This use is more specific than interpolating arbitrary functions between points: it's used to mix two pixels, for applications like antialiasing or transparency. This doesn't require arbitrary X-coordinates, so a simpler variant of lerp is common:

lerp :: (Num a, Num b) ⇒ b → b → a → b
lerp y1 y2 s = y1 + s * (y2 - y1)

As with clamp, I've used this form of lerp before, but didn't know its standard name - I called it mix. Can you tell I'm not a graphics person?

Now for a question. The examples above are in (untested) Haskell, but the type signatures are not the real ones, since Haskell requires in all of them that a == b. I wrote them this way to make the distinction between X and Y coordinates clear, since the types would be unreadable otherwise. Is this a good idea?

Function of the day: clamp

It's trivial and useful: clamp restricts a number to a range.

clamp val low high = min (max val low) high

Argument order varies, of course. This particular order has the convenient property that if you forget and write clamp min val max, it still works.

I wanted this function years ago, in an application where I'd written a great many calls to min and max, with a number of mistakes because of the counterintuitive usage (min to impose a maximum, max a minimum). I almost replaced them with clamp, but I couldn't think of a good name for it. Then recently I stumbled across it in some language's standard library (I have already forgotten which) and was delighted to finally be able to put a name to the concept; I must have been (so I ashamedly imagine) the only mathematically inclined programmer in the world who didn't know what it was called.