Effects vs. side effects

Commonly used terms get abbreviated. Thus functional programmers often say “effect” instead of “side effect”. I approve of this usage – not only because it's shorter, but because it frees up “side effect” for another concept. This is something assembly language programmers know, and have known for decades, that other programmers seldom speak of.

Most machines have no notion of a return value; the only way for parts of a program to communicate is by mutating registers. So assembly language programs must do all their communication by effect. This means they distinguish between different kinds of effect. In particular, they distinguish effects that are part of a routine's contract from those that, however consistent, are not intentional: side effects.

Consider this implementation of factorial on a typical register machine:

;The factorial function, iteratively
;args: r1 = n
;results: r2 = n!
;All other registers are preserved.
  li r2, 1
  cmpi r1, 1
  ble done
  mul r2, r2, r1
  sub r1, r1, 1
  b loop

This function leaves its result in r2, but also happens to set r1 to 1. This is a side effect: an effect not in the routine's contract. It is, of course, a bad idea to rely on these, but by accident or desperation, assembly programmers occasionally do, which is why they have a name for them.

(Recursive factorial is more complex than iterative on most machines – often absurdly so, if you strictly follow an ABI that wants you to save registers and construct stack frames. This is one of the reasons programmers accustomed to low-level languages don't take readily to recursion. To them, it looks unnecessarily complex, because it is complex in implementation. High-level languages hide this complexity, but low-level programmers know it's still there.)

It's not normal for programs in higher-level languages to have side effects in this sense, because they have fewer ways to accidentally have effects. Supposedly unobservable effects like preloading caches are common (and are occasionally relied on), but typically any observable effect that isn't part of the interface is a bug. So this concept is less useful in higher-level languages. The more general concept of relying on unspecified behaviour remains useful, though, and it's quite familiar from discussions of language specs.

Functional programming advocacy suffers from a focus on purity, where state is considered a sin to be avoided absolutely. One way the movement might make progress is to distinguish between different kinds of effects, so they could say which ones are deadly and which are venial, rather than treating all effects as indistinguishable evil. Vocabulary analogous to the assembly language programmers' “side effect” might help with this.

Customary semantics

What is the real, definitive semantics of a language? There are three standard answers:

  1. The natural-language specification, because it's the one the designers understand.
  2. The reference implementation, because it's unambiguous and well-tested.
  3. The formal semantics (of whichever flavor), because it avoids implementation concerns, so it's simpler than a real implementation. (Or because it's difficult and therefore “rigorous”.)

There's a controversial fourth option: the definitive semantics of a language is the behavior that is consistent across all conventional implementations.

This approach has some virtues:

  • It identifies the behavior you can rely on. Implementations have bugs and deliberate deviations from the spec, where you can't rely on the specified behaviour. They also have widely supported extensions which you can rely on, even though they're not in the spec.
  • Unlike any other means of defining semantics, implementations are heavily tested. Formal semantics can be tested by turning them into implementations, but seldom are; natural-language specifications aren't mechanically tested at all.
  • It's reconstructable. Users can always find out what their implementations do, even when the spec is not publicly available, or is difficult to read. (Most specs are.) Sometimes this shows them implementation-dependendent behavior, but by comparing implementations they can discover the customary semantics.

Deferring to custom is unpopular among language designers and theorists. We see it as an ill-defined, unstable foundation about which nothing can be known with confidence, and on which nothing can be built reliably. We remember the chaos that engulfed HTML and CSS and Javascript when their users treated buggy implementations as specs, and we don't want it to happen again. We want our semantic questions to have authoritative answers, and mere custom does not provide that.

But it's the de facto standard among users of languages. Most programmers are not language lawyers, and can't readily figure out whether the spec says their code will work. But they can easily try it and see what happens.

We can tell users not to do this. We can tell them to avoid empiricism, to seek authority rather than evidence, to shut their lying eyes and trust in doctrine. This is not good advice in most areas, not even in other areas of programming, nor for semantics of other languages natural or artificial. Is it really good advice for programming languages?

Whether it's good advice or bad, users don't listen. Their models are based on the behaviour they observe. As a result, many popular “myths” about languages — that is, widely held beliefs that are officially supposed to be false — are true in the customary semantics. For example, here are some parts of C's customary semantics that are not part of the formal specification. Some of them are violated on unusual architectures, but most C users have never written for such an architecture, so custom doesn't care.

  • Signed integers are represented in two's complement. (Rumor has it this is not quite always true.)
  • Signed integer overflow is modulo word size, like unsigned.
  • All pointer types have the same representation: an integer.
  • NULL is represented as 0.
  • Memory is flat: it's all accessible by pointer arithmetic from any pointer.
  • Pointer arithmetic is always defined, even outside array bounds. Overflow is modulo word size, just like integers.
  • Dereferencing an invalid pointer, such as NULL or an out-of-bounds pointer, blindly tries to use the address.
  • Compilers generate native code. The built-in operators compile to machine instructions.
  • char is exactly eight bits wide.
  • Characters are represented in a superset of ASCII.

(I thought sizeof(char) == 1 was only in the customary semantics, but it's actually in the spec.)

Much of the furor over optimizations that exploit undefined behaviour is because they're invalid in the customary semantics. Some C compiler maintainers have come to believe that the spec is the whole of the contract between compilers and users, and thus that users don't care about semantics not defined therein. It's a convenient belief, since it permits optimizations that would otherwise be impossible, but it's wildly at odds with what their users want. This isn't the only problem with these optimizations — they make for perverse error behaviour under any semantics — but this is why users tend to see them as not merely bad but incorrect.

Language lawyers, especially those who write specs, should take customary semantics more seriously, so they don't contradict the semantics in actual use.

Why is breadth-first numbering hard?

John Launchbury gave Chris Okasaki an annoying puzzle:

Given a tree T, create a new tree of the same shape, but with the values at the nodes replaced by the numbers 1 .. |T| in breadth-first order.

Go ahead and solve it. I'll wait.

If you want to solve it functionally, I'll wait longer.

Chris posed this puzzle to many functional programmers, and found that they had a surprisingly hard time with it. They took a long time to solve it, and their solutions were seldom elegant. He came up with various hypotheses as to why: did the programmers not know breadth-first traversal or queues? Did they prematurely commit to lists or pattern matching? He didn't seem to find any of them convincing. Neither do I.

One hypothesis he didn't mention is that most functional programmers see a recursive data structure and immediately try to process it by straighforward structural recursion, with a call tree isomorphic to the data structure. When you have many tools, and you encounter a nail, you reach for your hammer, right? But in this case structural recursion is the wrong tool, and it takes a while for programmers to backtrack far enough to notice.

It may take even longer for them to identify the right tool. Queues, like hashtables, are a little awkward for functional programmers, because their most natural implementations are stateful, as are many of their applications. They're almost always used linearly (i.e. there's only one version of the queue at a time), so eschewing state buys no useful flexibility, and incurs the extra hassle of explicitly passing the updated queue around. It also prevents using the efficient circular-buffer representation, just as it usually prevents using hashtables.

They're also a little awkward to use in functional languages, because none of the most familiar and widely implemented functional data structures (lists, tree dictionaries, tree sets, tries) is easily used as a queue, so would-be queue users must look up a queue library, or build one, or use pairs of lists (if they know this trick), or use some inappropriate data structure, or give up and use some other algorithm. Which is what most of Chris's subjects did.

Meanwhile, Java users use its ordinary LinkedList class (which is a doubly-linked list, and thus a reasonably efficient deque) to win contests without having to worry about any of this. Can your functional language do as well?

“Persistent” is older than I thought

“Persistent” usually refers to data that persists across multiple executions of a program. But in the last few years I've occasionally heard it used, especially in the Clojure community, for data structures that are updated nondestructively. (The old version persists after the update — get it?)

I saw this as a newfangled sloppy usage, but it ain't so. Like many seemingly new usages, it's surprisingly old. It dates back at least to 1985, when it appears prominently in Sarnak & Tarjan's Planar Point Location Using Persistent Search Trees. It was popularized a few years later by the same authors plus two others in Making Data Structures Persistent, and Tarjan has used it many times since then.

The more common sense — data that persists across program executions — is not much older. The earliest uses I've found are from several papers in 1976. Earlier ones are about either persistence of phosphors (an important issue for CRTs) or fault-tolerance by persistently retrying. It apparently caught on quickly, at least in the database community, because Jim Gray's 1977 Notes on Data Base Operating Systems takes it as standard enough to use without bothering to define it.

So it's reasonable to object to “persistent”=“nondestructive” because it conflicts with a more important concept, but not because it's new.

Maybe now someone will tell me it's in some very standard source like Knuth and I never noticed...

A sound bug finder is an unsound correctness prover

This account of Coverity, a commercial bug-finding tool for C and C++, illustrates a peculiar attitude common in the program-checking field:

we were also unsound. Our product did not verify the absence of errors but rather tried to find as many of them as possible. Unsoundness let us focus on handling the easiest cases first, scaling up as it proved useful. We could ignore code constructs that led to high rates of false-error messages (false positives) or analysis complexity, in the extreme skipping problematic code entirely (such as assembly statements, functions, or even entire files). Circa 2000, unsoundness was controversial in the research community, though it has since become almost a de facto tool bias for commercial products and many research projects.

Most program checkers prove theorems about programs. In particular, most aim to prove programs correct in some respect (e.g. type safety). A theorem prover is sound iff all the theorems it proves are true. So a correctness-prover that claims a buggy program is correct is unsound, but one that rejects a correct program is not. People in the program-checking field are accustomed to this, so they habitually think soundness = proving the absence of bugs.

But a bug-finder doesn't aim to prove correctness. Instead, it aims to prove incorrectness: to prove the presence of bugs. It's sound iff all the bugs it reports are real bugs — that is, if it has no false positives. False negatives (overlooking bugs) are OK, because they don't make its claims incorrect.

Unfortunately, most interesting properties are undecidable, so a checker can't be sound at both bug-finding and correctness-proving, unless its claims are very weak.

So Coverity did the right thing, in theory as well as practice, when they focused on suppressing false positives. Their bug finder was unsound, but it was unsound because it reported spurious errors, not because it missed some real bugs.

Addendum: bug finders in languages

The most visible bug finders (especially in academia) are those, like the ML typechecker, that try to prove something about the program, and report a bug if they fail. These are unsound as bug finders, since they sometimes report nonexistent bugs. Unfortunately, bug finding is their main use, so their standard of soundness does not fit.

This is particularly problematic for checkers that are built in to a compiler, and don't just complain but prevent programs from running. (This is part of why they're so visible — if the checker makes mistakes you can't ignore, you have to be aware of it.) It's hard (especially in theory) to justify a compiler that rejects correct programs. Sound bugfinders don't have this problem.

Why Lambda the Ultimate doesn't make me feel stupid

This search term appeared in my referer logs a few years ago: “lambda the ultimate makes me feel stupid”.

I used to feel that way — at least, I felt ignorant and despised. The denizens of LtU know an intimidating amount of theory, and some are quick to scorn anyone who doesn't, and demand that they read heaps of literature (often of dubious quality or relevance) before being permitted to talk. Not having read most of that literature, I accepted their evaluation, and felt ignorant.

But then battles of the War of Types erupted there, and λ the Ultimate became λT the Ultimate, or even System F the Ultimate. Anyone who dared suggest that dynamic typing was a reasonable basis for a language, or even a meaningful concept, faced voluminous condescension and condemnation. The seemingly knowledgeable scholars appeared to neither know nor care about the fundamental concepts of their field, and treated its greatest successes with learnèd disdain.

I do respect very much the elephant, and if your work dismisses him as an ill-formed dinosaur, it is not zoology.

(I don't think the dynamic typists gave a good account of themselves either; there was lots of handwaving about flexibility and little mention of the importance of simple semantics. But I found them less offensive, not only because I agree with them, but because they didn't demand the anathematization of the other side, nor of the object of study.)

The War of Types subsided, and LtU became once more a place of academic quiet, disturbed only by announcements of new PL papers. It still makes me feel ignorant at times, but it no longer makes me feel stupid. Sometimes it even makes me feel wise. Which is a much more dangerous emotion. When I feel stupid or ignorant, I study to become less so, but when I feel wise, I do nothing.

Exceptions are (mostly) for humans

Most languages have a way to signal a generic error with a string as the error message. This makes it easy to include relevant information in the description of the error: something as simple as (error "serve-ice-cream: quantity=~S must be positive" q) provides a human-readable description with whatever information you think is relevant. It's not machine-readable, but most errors don't need to be handled mechanically, so this is not usually a problem.

Languages with exception systems also allow signalling errors in a machine-recognizable way, typically by defining a new exception class. This is often considered the “proper” way to signal errors, but it's more work than a generic error, so it's typically done only for polished public interfaces. Errors that aren't exposed in such an interface (or aren't intended to be exposed — errors are a kind of implementation detail that's hard to hide) generally make do with strings.

When you do create an exception class, it's also more work to include relevant information in the exception. Typically you have to define slots, arrange for them to get the appropriate values, and then embed them in the error message. This requires changing several parts of the definition as well as the call site, so it's enough trouble that you often won't do it. Error reporting code is seldom high on the priority list until the error happens.

I ran into this problem a while ago, in a utility function which reported a rare error by throwing a C++ exception class like this:

class negative_error : public domain_error {
    not_integer_exn() : domain_error("value must not be negative") {}

This was fine until the error finally happened. A high-level catch-almost-anything handler caught the exception and displayed the error message, which told me almost nothing about the problem. Since this was C++ and not running under a debugger, there was no automatic stack trace, and no hint of what value was negative, or who cared, or why. If I had been lazy and signaled the error with throw domain_error("serve_ice_cream: quantity=" + to_string(q) + " must not be negative"), the relevant information would have been in the string, but because I had done it the “right” way, it was not.

(The designers of C++ are aware of this problem. That's why all the standard exceptions take strings. negative_error should have too.)

In an ideal exception system, convenience and machine-readability would not conflict. It should be easy to signal an an-hoc error with a human-readable message and machine-recognizable fields. It might help to allow throwing exceptions without declaring them first, e.g. (throw '(negative-error domain-error) :quantity q "value must not be negative"). (Wasn't this allowed in some early exception systems?) But if it's only easy to have one of the two, choose the convenient human-readable form. That's the one you'll use.

What happened to “manifest” and “latent”?

Chris Strachey has remarkably influential lecture notes. His 1967 Fundamental Concepts in Programming Languages introduced or popularized a lot of now-standard terminology: r-value and l-value, first-class, polymorphism (ad-hoc and parametric), and maybe parametric type.

It also introduced some terms which didn't catch on, among them manifest and latent:

We call attributes which can be determined at compile time in this way manifest; attributes that can only be determined by running the program are known as latent.

These are the concepts now called “static” and “dynamic”. I'm not sure why Strachey bothered to introduce his own words for them, since the standard ones already existed, and he was evidently more comfortable with them — when he discusses types on the same page, he consistently uses “dynamic”, not “latent”. (Was “dynamic typing” already a standard term by 1967?) Maybe he reserved “static” and “dynamic” for behaviour, and wanted different words for the time when a property could be determined.

He acknowledges that the boundary between static and dynamic is fuzzy, and explains why it's useful anyway:

The distinction between manifest and latent properties is not very clear cut and depends to a certain extent on questions of taste. Do we, for example, take the value of 2 + 3 to be manifest or latent? There may well be a useful and precise definition—on the other hand there may not. In either case at present we are less interested in the demarkation problem than in properties which are clearly on one side or other of the boundary.

I wish more academics dared to do that.

Neither “manifest” nor “latent” caught on, and they might have been forgotten like most new coinages — but decades later, both have been resurrected with new meanings in connection with type. “Manifest typing” now refers to languages that require type declarations — an important concept that lacked a short name. “Manifest” is readily reinterpretable as “appearing in source”, and while it might confuse people who remember the old sense, we are few. Less usefully, “latent typing” serves as a euphemism for “dynamic typing” among type-theory partisans (bizarrely, as the word they object to is “type”, not “dynamic”, but at least it avoids using the terminology of the savages). In neither case does Strachey's original meaning survive; if you speak of some property other than type as “manifest” or “latent”, most proglang researchers will not understand.