Arcane Sentiment: December 2008

One more post before the end of the year:

Haskell users tend to be mathematically inclined, so it's no surprise that they should notice a mathematical application of typeclasses: they can represent abstract algebraic structures. You only get one model of the ~~algebra~~ structure per type (potentially annoying for common ~~algebras~~ structures like groups), but where applicable this makes a great (for some values of "great") way to show off your mathematical prowess. Indeed, some of the standard typeclasses are intended as ~~abstract algebras~~ algebraic structures: Monad and MonadPlus, at least.

There are probably quite a few Haskellers who have not only noticed this, but wished it were taken further. For instance, why isn't / defined like this?

(/) :: (Field a) => a -> a -> a

(Really mathematical Haskellers may also wish the return type was Maybe a.)

Isn't Num actually Ring, and Fractional Field?

Not quite. Integer is indeed a ring, but Int isn't because of overflow. Ratio is a field, but Float isn't, because of rounding. ~~Abstract algebras~~ Algebraic structures are useful only because it's possible to reason from their axioms and apply the conclusions to the models, and that doesn't work when the models don't actually satisfy those axioms. The reason Monad works is that its standard instances do correctly model it. The others are stuck with names less proudly abstract but less misleading (not to mention less opaque to non-mathematicians).

Of course, none of these models are really correct, because they all have the limitation of finite memory. But that's a less intrusive limitation than fixed-precision numbers. "Less intrusive" is a disappointingly vague description; I suspect the real difference is that finite memory afflicts virtually every operation in the language, while overflow and rounding are specific to certain types. So while assuming the monad laws work on lists, for example, will cause problems in some circumstances, it won't cause any more problems, because your programs would fail anyway in those cases. In contrast, it's pretty easy to write code that will work for any ring, but not for Int.

When do good programmers write bad code? There are a great many ways to do that, of course, and most of them aren't very illuminating. But there are some common syndromes that account for a lot of poor code. One is when the programmers are writing in a language they don't know very well. They can't use its less guessable features, and since they aren't sure what its pitfalls are, they guess, and end up defending against ones it doesn't actually have. For example, I often see C++ code like this:

if (somepointer)
    delete somepointer;

This is unnecessarily explicit - delete is defined to do nothing if its argument is null, precisely so you don't have to check. (And every implementation I've ever used has done this correctly.) It should be just:

delete somepointer;

But someone new to C++ won't know that unless they've learned it explicitly. Until then, a reasonable programmer may continue to defend against the obvious mistake of deleting null.

Programmers who switch between related languages are prone to infelicities like this:

FILE *f = NULL;
f = fopen(filename, "r");

This is quite reasonable in C, because there might be (or might previously have been) something between these two lines - and in most dialects of C, declarations have to precede statements, so you can't always combine the two. C++, like recent C dialects, lacks this annoying restriction, but a C programmer won't necessarily know that.

They should know better than to read lines from a file like this, though:

char line[MAXLINELENGTH];
while (!feof(somefile)) {
  memset(line, 0, MAXLINELENGTH);
  if (fgets(line, MAXLINELENGTH, somefile)) {
    //do something with line...
  }
}

instead of the shorter equivalent:

char line[MAXLINELENGTH];
while (fgets(line, MAXLINELENGTH, somefile)) {
  /* do something with line... */
}

It's easy to understand why someone might write the longer version. The C standard library is full of functions which are difficult or impossible to use safely (gets, for instance), or which require unnecessarily tedious checks when calling them (malloc, free). Someone familiar with C, but not with the details of fgets, might reasonably doubt it would work correctly without the explicit feof and memset. (And once you've learned to fear uninitialized buffers, it may be easier to memset everything than to think about whether it's necessary.) But fgets is actually well designed. It always null-terminates the buffer, and its return value is well-behaved at EOF. Neither of these defensive additions is necessary.

In fact, the shorter version of that loop actually has slightly better error behavior: if some error (loss of network connectivity?) prevents fgets from reading the file, but doesn't cause eof, the longer code might loop forever, while the shorter one will stop. So in this case the defenses have made things worse, although it's unlikely to matter. I suppose the truly paranoid answer would be to call ferror too...

I'm not complaining that the authors don't know the language they're writing in. It would be nice if they did, but in practice it's often necessary for people to work in languages they don't know very well, and in that situation they'll inevitably work around limitations in their knowledge the same way they work around limitations in the language. This will usually be more efficient than scouring the documentation for shortcuts that might exist. Most of the time their code will still work, and if they keep using the language, they'll learn the details soon.

But sometimes they're unwilling to learn. I've seen all three of these bugs in C++ code written by experienced C++ programmers - people who should really know the commonly used parts of the language by now. Most programmers will simply learn these features when they're pointed out, but some don't want to. They find excuses (sometimes rather contrived ones) to discount the new information and keep programming the way they were. I don't understand what's going on here. Are they accustomed to feeling confident in their skills, and don't want to disrupt the situation by venturing into unfamiliar territory? But how does someone become a programmer without being generally curious about the subject?

Arcane Sentiment

Abstract algebra with imperfect models

Defending against nonexistent problems