A controlled experiment in static type

There should be more language research like Stefan Hanenberg's experiment on how static typing affects productivity. It uses a rigorous methodology which is standard everywhere except in software engineering research: a controlled experiment. It compares a substantial number of programmers solving the same problem in two purpose-built languages which differ only in a single feature: the presence of a static type system.

Unfortunately the languages tested are not state of the art – the dynamically typed language is approximately Smalltalk, and the static types are those of early Java, with required type annotations and casts and no parametric types. Not surprisingly, it found that this sort of static typing is useless or worse. (Curiously, it also found that debugging type errors took longer when they were found statically than when they were found dynamically, but this is weak evidence, because the debugging times may include work other than fixing the type errors.)

This needs to be replicated with parametric types and optional type annotations!

The obvious languages to compare are a minimal pair differing only in the presence of the static checker — perhaps ML and ML-minus-the-typechecker, since that's easy to implement. We can do better, though: we can distinguish between different effects of static typing. There are many hypotheses about how it affects productivity; some of the most plausible are:

  • It finds bugs faster.
  • It directs the user's attention to types. (It's not clear whether (when?) this helps or hurts.)
  • It restricts programs by preventing constructs like arbitrary union types and variable arity.

ML-minus-the-typechecker suffers from the third effect: even though it doesn't have a typechecker, its library is limited to things the typechecker can handle – so no read or eval or type predicates or even printf (without exotic tricks). So if that effect is important, comparing ML to ML-minus-the-typechecker will not detect it.

We can distinguish between these three effects by devising a way to inflict each effect independently. We can start with a dynamically typed language with an appropriate standard library, and vary three features:

  • To find bugs, use a nonrestrictive (“soft”) typechecker, which detects some type errors but doesn't interfere with running programs.
  • To selectively divert the user's attention to types, vary the prominence of type declarations: whether they're used in the standard library documentation, whether they're printed in the output of the REPL, whether they're emphasized in the language documentation, and maybe even whether they're enabled at all.
  • To restrict programs, use a version of the standard library with everything that would confuse the Hindley-Milner typechecker removed or altered. (This is not the typechecker we're testing, but it represents a typical degree of restriction.) This doesn't restrict user code, but restricting the library should have a similar effect, especially on small programs. (We could also restrict user code by running the H-M typechecker on it and delaying error reporting until runtime, to make it less helpful for finding bugs.)

The eight combinations of these three features allow us to test all three hypotheses, and detect interactions between them.

(Note that we don't test the Hindley-Milner typechecker, because it both finds bugs and restricts programs, and we want to distinguish between these effects. It would be interesting to compare it to the nonrestrictive typechecker, though, to compare the two checkers' efficacy at finding bugs. Has this been done before?)

One experiment like this can do more to advance our understanding of type systems than ten type theory papers, because it addresses questions we care about, not just ones we know how to answer. I wish I could do this one myself, but as that's unlikely to happen, I encourage someone else to do it.

(Via It will never work in theory — the world's finest, and probably only, empirical software engineering research blog.)

(By the way, I was surprised to see the Mann-Whitney U-test used for all the significance tests, instead of Student's t-test. I thought the U-test was a special-purpose test for ordinal (not cardinal) data, but apparently some statisticians recommend it as the default significance test, on the grounds that it's more robust and only a little less powerful.)

Literate programming tools are mostly obsolete

Literate programming tools are utopian technology, like Plan Nine and Lisp machines: they appear often in descriptions of how the world ought to be, but rarely in actual use.

There's a reason for this. Literate programming tools traditionally have four features:

  1. Comments by default: text is interpreted as comments unless explicitly marked as code.
  2. Reordering: there is a macro system that allows writing code in a different order than that required by the language.
  3. Markup: comments can contain more than just text — they may contain markup in some language like Markdown or HTML, perhaps with additional language-specific features as in Javadoc.
  4. Typesetting: source and comments can be cross-referenced and printed neatly in HTML (or occasionally in print formats like TEχ or PDF, but at the moment HTML is the one that matters).

Only one of these features is of much use.

In the literate ideal, programs have more comments than actual code, so making them the default saves typing. This level of commenting might be plausible for APL or hairy assembly code or mathematical wizardry, but none of these are common. Real code, even very good real code, usually has much more code than comments, so while making comments the default might be justified as a device to encourage comments, it doesn't make programs shorter.

The original literate programming system, Knuth's WEB, was for Pascal, which is strict about the order of code; in particular it expects all definitions to appear before they're used. It also has no macros and no lambda, so it's limited in its ability to factor out code even when order is not a problem. WEB's macros address both problems, making Pascal much more flexible. Most modern languages, however, are much less rigid and more expressive, so they don't need the help.

Markup sounds like it would help with readability of comments. Unfortunately, most of what we have to say in comments is simple prose, which markup can't improve much. When it is used, markup (unless it's very lightweight, like Markdown) adds enough clutter to make comments less readable in source form — and almost all reading of comments is in that form. The architects of Utopia may wish we saw our code in typeset form, but we mostly see it in our editors.

Typesetting, at least, has been a great success — so much so that editors have absorbed it. Today, most programmers consider an editor primitive if it doesn't have language-aware autoindent and syntax-coloring, and many also expect name completion, type hints and the ability to jump to definitions. Though originally intended for fancy printouts, typesetting and cross-referencing are now considered too important to postpone to compile time; we expect them while editing. Only documentation is still generated by a separate pass.

I don't see literate programming as a failure; it explored the important question of how tools can make programs more readable. But the best answers to that question turned out to be embodied in languages, not literate programming tools. It wasn't obvious a priori that readability matters more when editing code than when publishing it, nor that it was simpler to make languages more expressive than to add flexibility in another layer of tools. These things are obvious in retrospect, but it took experience with tools to discover them. Rest in peace, WEB: you taught us not to need you any more.

“There need be no real danger of it ever becoming a drudge”

Jonathan Edwards quotes a beautiful piece of naive optimism from Turing:

The process of constructing instruction tables should be very fascinating. There need be no real danger of it ever becoming a drudge, for any processes that are quite mechanical may be turned over to the machine itself.

- Turing, A. M., 1946, Proposed electronic calculator, report for National Physical Laboratory, Teddington

Admittedly Turing was speculating, and he's sort of right; programming is more amenable to automation than other things humans do. But he must have been disappointed to discover how difficult and slow the process of turning processes over to the machine is.

It's tempting to believe there's a sort of incompleteness here: as the ability of formal systems to express propositions always grows faster than their ability to prove them (more optimistically, their ability to ask questions always grows faster than their ability to answer them), does the ability of languages to generate repetitive tasks always grow faster than their ability to automate the repetition?

Wrt expressiveness, no: programming languages don't necessarily face an incompleteness problem because they don't need to be consistent: they need only express programs, not completely avoid expressing errors. (And even if they did, mere incompleteness does not imply drudgery; if programming generated endless novel problems, it would hardly be a drudge.) But most of the repetition in programming is about understanding and transforming code. This does need to be consistent, so increased expressive power doesn't necessarily help. Any increase in the space of programs that can be expressed is an increase in the space of programs that must be excluded to understand a program — and it's typically undecidable which features affect which code, so this can't always be done by machine. QED?

I'm not convinced by this argument. Incompleteness and undecidability are familiar reasons for things to be impossible, so they make good excuses, but I doubt they have anything to do with the existence of drudgery in programming. Most of the repetitive analyses we do are amenable to automation — if not in the general case, then at least in most common cases. But we don't try very hard. We write tools for many purposes, but program understanding has not traditionally been a popular one; we hardly use tools for it unless they're imposed on us by being built into a language. Is the real problem here that we don't mind the challenging drudgery of understanding?

Making the right thing easy

Standard output is often slightly abused. Its proper use is for programs that produce an output stream, but its most popular use is for debug prints. These really ought to go to standard error (or perhaps to a more specific stream like CL's *trace-output*). But in most languages that's more trouble, requiring something like fprintf(stderr, ...) instead of just printf(...), and it usually makes no practical difference (especially in programs that don't use standard output for anything else), so most programmers don't bother. We'd prefer to do the Right Thing, but not if it's any extra work.

Caml removes this barrier to doing the right thing: its prerr_* functions are just like print_*, but print to standard error instead of standard output. This means you can send your debug messages to the right stream as easily as to the wrong one. (In retrospect, this seems obvious — if a language provides convenience functions for one stream, shouldn't it be standard error?)

This particular feature is not important, but the general principle is. Programmers tend to take the path of least effort, even if that leads to bugs (as with C's string functions or unescaped Unix shell arguments) or awkwardness. Any case where they need to be told to do things differently may be a case where the language could prevent a problem by making the right thing easy.

Standard trajectories

History repeats itself — in programming languages too.

Andreas Rossberg pointed out a hysterical raisin in Java:

The reason Java generics turned out sub-par is mainly because they were added too late! They had to work around all kinds of language design and infrastructure legacy at that point. Compare with C#, where generics were put into the language early on and integrate much nicer. That is, generics should be designed into a language from the very beginning, and not as an afterthought.

John Nagle replied:

The other classic language design mistake is not having a Boolean type. One is usually retrofitted later, and the semantics are usually slightly off for reasons of backwards compatibility. This happened to Python, C, LISP, and FORTRAN.

These are two of the standard trajectories languages take through design space. Some things happen over and over:

  • Languages make more type distinctions over time. Booleans are a common case of this: if you have integers or nil or a defined/undefined distinction, separate Booleans are obviously unnecessary, so they're often left out of early versions of languages. Eventually the designers tire of wondering which ints are really integers and which are Booleans, and add a separate type.
  • Restrictions chafe and get relaxed. In particular, languages with restrictive static type systems usually add extensions to make them less restrictive: witness Haskell's numerous extensions for high-order polymorphism, dependent types, deferred type errors, and so on. Java's generics are a case of this — it got along without them for a while because it was possible to escape from the static type system by downcasting.
  • General cases supplant special cases. In particular, many languages start with a small closed set of datatypes, and add more very slowly. At some point they add user-defined types, and initially treat them as another, very flexible type. Once they get used to user-defined types, they realize all the others are unnecessary special cases which could be expressed as ordinary (i.e. user-defined) types. Python went through this (painfully, because its built-in types had special semantics); Scheme will soon, since R7RS is standardising define-record-type. (I'm not sure if the Schemers realize this; several WG1ers objected to a proposed record? operator on grounds of strong abstraction, but none mentioned that it should return true for all types, including built-in ones.)
  • Interfaces are regularized. It's easier to add a feature than to understand its relation to the rest of the language, so many features have ad-hoc interfaces, which are later replaced with uniform ones. This seems to happen to collection libraries a lot, although this might be simply because they're large and popular.
  • Newly fashionable features are added in haste and repented later. How many languages have poorly-designed bolted-on “object systems”? Sometimes parts of the new features are later extended to the rest of the language; sometimes they're made obsolete by better-designed replacements; sometimes they're abandoned as not useful.
  • Support for libraries and large programs is often added later. Newborn languages have few libraries and no large programs, so there's little need for features like modules or library fetching or separate compilation. When they're later needed (or perceived to be needed), they're added, often as kludges like Common Lisp's packages and C's header files, or even as external tools like C's linker (and makefiles) and Clojure's leiningen. This may be changing: libraries are now considered important enough that new languages usually have modules, at least.
  • And, of course, languages grow. It's much easier to add a feature than remove one.

Many of these trajectories are easy to trace, because they leave trails of obsolete features kept around for compatibility.

What other pieces of history repeat themselves?

Specification vs. documentation (vs. Kernel)

Language specs are the most prestigious form of language documentation, and to many language designers the most familiar, so people describing new languages sometimes do so in the form of a spec. They write of what implementations “should” and “must not” do, as if there were any implementations, and of portability, as if there were any code to port, and with a language lawyer's opaque rigor, as if there were any language lawyers to defend against.

This is a waste, of course. Unless you're standardizing a language that's already popular enough to have multiple implementations, you should be writing documentation of what the existing implementation does, not a specification of what implementations should do. And if it's a research language, your audience is researchers, not users, so you should focus on elucidating your new ideas, not on thoroughly documenting the mundane parts of the language.

This is what I thought when I read the Kernel spec. But after only three years, there are some eighteen other implementations of vau-based languages, and at least two (klisp and klink) aim to comply with the Kernel spec. Fexpr lisps are so simple that a full implementation fits in the scale of a casual personal project. The other objections may still apply, but at least Kernel has multiple implementations to standardize.

Research cultures

John Regehr identifies some virtues of the culture of operating systems research, which are lacking in the culture of programming languages research:

The best argument is a working system. The more code, and the more results, the better. Something that is clearly a toy isn’t convincing. It is not necessary to build an abstract model, conduct a user study, prove soundness, prove correctness, or show any kind of asymptotic bound. In fact, if you want to do these things it may be better to do them elsewhere.

Yes. A working system, especially one that can support diverse applications without major pain, shows that you got everything important approximately right. A proof only shows that you got one thing right, and it's easy to be mistaken about whether that thing is important. So a working system can be stronger evidence that you got it right than a proof.

The style of exposition is simple and direct; this follows from the previous point. I have seen cases where a paper from the OS community and a paper from the programming languages community are describing almost exactly the same thing (probably a static analyzer) but the former paper is super clear whereas the latter is incredibly difficult to figure out.

I too find operating systems papers easier to read than language papers — and I'm from the languages community, so this isn't just a matter of familiarity. Systems researchers write as if they're trying to communicate, but language researchers write as if they're trying to make their results look like fancy academic research. This is a common failure mode (since researchers are evaluated partly on how difficult their work looks), and somehow it's become part of the standard style in language research, leading well-meaning writers to produce impenetrable papers.

It wasn't always so. Why has systems research kept these virtues while language research has descended into theory and impenetrability? How can this be reversed?

Stylistic workarounds

Some C programmers carefully include a comma on the last element of an array initializer:

int tau_digits[] = { 6,
  2, 8, 3,
  1, 8, 5,
};

Similarly, some Lispers put an extra newline at the end of any list that might grow:

(defun execute-instruction (ins vm)
  (declare (type instruction ins) (type vm vm) (optimize speed))
  (case (opcode ins)
    (nop)
    (constant (write-register vm (destination-register ins) (literal ins)))
    (add (write-register vm (destination-register ins) (+ (read-register vm (source-register-1 ins)) (read-register vm (source-register-2 ins)))))
  ))

Some C++ programmers write member initializers like this:

Class::Class()
: field1(arg)
, field2(arg)
, field3(arg)
{ /* ... */ }

The rationale, in all three cases, is to make diffs clearer. A very common change to such lists is to add new elements, usually at the end. If they're written in the usual way, the diff will show the comma added to the previous item, even though there's no substantive change to that line:

int tau_digits[] = { 6,
   2, 8, 3,
-  1, 8, 5
+  1, 8, 5,
+  3, 0, 7
 };

If that line is long, it may not be obvious that it hasn't changed:

 (defun execute-instruction (ins vm)
   (declare (type instruction ins) (type vm vm) (optimize speed))
   (case (opcode ins)
     (nop)
     (constant (write-register vm (destination-register ins) (literal ins)))
-    (add (write-register vm (destination-register ins) (+ (read-register vm (source-register-1 ins)) (read-register vm (source-register-2 ins)))))))
+    (add (write-register vm (destination-register ins) (+ (read-register vm (source-register-1 ins)) (read-register vm (source-register-2 ins)))))
+    (subtract (write-register vm (destination-register ins) (- (read-register vm (source-register-1 ins)) (read-register vm (source-register-2 ins)))))))

This is a weakness of our diff tools: they operate on lines, not characters or tokens or trees, so they overstate the scope of the change. Since it's not convenient to fix the tools, we change our programs to avoid the problem. So far this is all reasonable. But when such a workaround becomes a stylistic prescription, it's easy to forget where it came from, and how narrow the conditions that justified it were. If our diff tools improve (which is not unlikely, since so many people suffer this itch), so this workaround is no longer necessary, how long will we keep doing it? In twenty years, will programmers still be taught to format lists (of whatever sort) specially, even though it no longer serves any practical purpose?

What other common stylistic prescriptions are workarounds for bugs in tools?

I can't think of controversial opinions either

James Hague is right:

I read 20 Controversial Programming Opinions, and I found myself nodding "yes, yes get to the interesting stuff." And then, after "less code is better than more," it was over. It was like reading a list of controversial health tips that included "eat your veggies" and "don't be sedentary." In an effort to restore a bit of spark to the once revolutionary software development world, I present some opinions that are hopefully more legitimately controversial.

But James' alternative opinions are (except for the ones about social institutions) hardly more controversial. At best they're offensive to a vocal minority, e.g:

Purely functional programming doesn't work, but if you mix in a small amount of imperative code then it does.

This is taken for granted in the Lisp and ML communities, and even among Haskellers it's not an unusual opinion. It's heresy to the functional puritans, so it's controversial in the sense that it generates argument, but not in the sense that informed people think it's very unlikely to be true.

When I try to generate controversial opinions, I get similar results (and I'm not giving any examples, because they were embarrassingly insipid). Instead of bold heresies, I say things I expect other people to disagree with quantitatively, but not qualitatively (although I'm probably underestimating the degree of disagreement). Apparently it's hard to distinguish one's model of the truth from one's model of other people's opinions.

Autoconversion is like weak typing

A while ago, I made a simple mistake using zlib:

char *file = "/some/path";
gzFile f = gzopen(file, "wb");
if (f) {
  gzprintf(file, "something\n");
  /* ... */
}

This is a static type error, of course: gzprintf expects a gzFile, not a char*, and they have different C types. Unfortunately, C doesn't detect it, because (in older versions of zlib) gzFile happens to be a typedef for void*, and in C char*, like all pointers, can autoconvert to void*. By accident of memory layout, this didn't crash at runtime either; it simply did nothing, causing me some puzzlement when this one line was missing from the otherwise intact output.

In addition to providing a puzzle, this mistake sheds light on some confusing terminology. Like most terms related to type, “weak typing” has several meanings. I usually reserve it for languages with typechecks that don't always apply, either because they can be suppressed by casts, or because some common1 operations don't provide them. But it's also used for the unrelated concept2 of autoconversion.

I thought this was just an arbitrary reuse of a term, but the autoconversion of char* to gzFile demonstrates a common meaning: autoconversion, like casts and missing type checks, means some type errors aren't detected. They usually cause conversions instead of blind misinterpretation of data (though not in this case), but from the user's point of view, the effect is the same: the type checks are unreliable. If you accidentally do arithmetic on a Perl string and it autoconverts to zero, it's small comfort that the conversion is type-safe — the program still continues happily computing incorrect results after a type error. So it makes sense to use the same term for any system of type checks that overlooks some errors. It's confusing to call completely different mechanisms by the same name, but the result is the same: both autoconversion and casts make type checks weak.

1 If some rare, exotic operations circumvent typechecks, that's OK, because you won't use them by accident.

2 And sometimes for even more unrelated concepts like dynamic typing, but this usage is usually either careless or pejorative. When people want to talk about dynamic typing, they don't call it “weak typing”.

“Types considered harmful” considered insightful

Provocative titles work. When a famous type theorist like Benjamin Pierce gives a talk titled Types Considered Harmful, it gets my attention. (Eventually. Years later. After someone links me to it.) This talk describes a contract-checking system, which wouldn't be very interesting in its own right, title or no, but it also includes a bunch of insights about typechecking. Here are the good bits:

First (and this is not new), why does type inference find bugs? It's not because there's anything special about types per se, but simply because it entails enough analysis that inconsistencies in the program are likely to produce contradictions.

  • Attempting to prove any nontrivial theorem about your program will expose lots of bugs
  • The particular choice of theorem makes little difference!
  • Typechecking is good because it proves lots and lots of little theorems about your program

(“Proving theorems” sounds scary, but if you describe it as “checking properties” it means the same thing and is much less intimidating.)

Second, it acknowledges an oft-ignored cost of static typing: it makes languages much more complicated.

Fancy types make the programmer's head hurt
  • Type systems – especially very precise ones – force programmers to think rigorously about what they are doing
  • This is good... up to a point!
    • Do we want languages where a PhD* is required to understand the library documentation?
* two PhDs for Haskell
Fancy types make my head hurt
  • Complex type systems can lead to complex language definitions
  • Easy to blow the overall complexity budget

The notion of a complexity budget ought to be more widely used. In addition to the limitations of designer and implementor effort, there's a limit to how much users will learn about a language. Once the language's complexity reaches this limit, additional features are not very useful, because the users who are supposed to benefit from them don't know about them.

Third, it has a useful interpretation of the relationship between typechecking, contract checking (in the particular form described in the talk), and tests:

  • Types ~ General but weak theorems (usually checked statically)
  • Contracts ~ General and strong theorems, checked dynamically for particular instances that occur during regular program operation
  • Unit tests ~ Specific and strong theorems, checked quasi-statically on particular “interesting instances”

These two dimensions of generality and strength are easy to reuse:

  • A dynamic typecheck is specific (because it checks only one case) and weak (because all it proves is that the value has the right type) and checked dynamically.
  • The purity of core Haskell is another general and weak theorem — or rather assumption, as it's not checked at all.
  • Traditional assertions are specific, of variable strength, and checked dynamically.
  • Model checkers prove general strong theorems statically.
  • The “typeful” style often used in ML and Haskell, where more distinctions are moved into types, is an attempt to make the typechecker's weak theorems stronger.

The title is not just for provocation. It's also a summary of the talk's apparent thesis. Pierce doesn't say this explicitly in any of the slides, so maybe I'm reading my own conclusion into them, but he seems to be saying: proving theorems about programs is good, but having no choice about which theorems to prove is bad. What if the theorem that's forced on you is hard to prove, or easy to prove but not by the methods the checker knows? What if it's not even true, or shouldn't be true? A typechecker (or any obligatory checker) is “considered harmful” because it chooses the theorems for you, and you can't refuse it when it chooses poorly.

Other uses for Unicode

Unicode identifiers are impractical because users can't type them. But languages can safely use Unicode characters for many other purposes. For example:

  • Prompts. Doesn't every language need a distinctive prompt? Why not adopt a Unicode character, so the prompt won't look like anything users type?
  • Alternate names: If you allow one symbol to have multiple names, the pretty-printer can use Unicode while humans type in ASCII. Doing a lot of this would interfere with learning, but basics like printing Haskell \ x -> x as λ x → x shouldn't hurt much.
  • Alternate reader syntax: Similarly, print can use Unicode characters in place of the least satisfactory ASCII ones. Wouldn't it be nice if vectors printed with proper angle brackets instead of #(...)?
  • Unreadable objects: If an object's printed representation can't be read back in, it need not be typed by humans, so it can use arbitrary characters. This might improve the printed representations of functions, classes, promises, etc., which often print unhelpfully at the REPL.
  • Non-ASCII art: Languages (and libraries) often print everything as text, even things like tables that shouldn't be, because that's the only output interface they can rely on. Unicode can help make this less ugly. (Box-drawing characters!)
  • Documentation and error messages: you can use whatever you want here.

REPLs, by the way, should exploit terminal control before worrying about Unicode. Small improvements like coloring and a syntax-aware line editor make a big difference to usability. Not everyone has an IDE or wants to use it for everything.

Number syntax extensions

I approve of James Hague's proposal to allow numbers with k as a suffix, so 64k is 65536. However, there is that binary vs. decimal issue.

The IEC/ISO binary prefixes (ki, Mi, Gi, etc.) have not been widely adopted, and for good reason: unlike the regular SI prefixes, they don't represent anything in spoken language. They come with proposed pronunciations, but these are too awkward (especially when followed by ‘byte’) to be accepted — ‘kibibyte’ sounds like a mockery of the SI prefixes, not an addition to them. As long as there are no binary prefixes in speech, there will be little demand for abbreviations that aren't short for anything.

There is (was?) another proposed set of binary prefixes which suffer this problem slightly less: bk, bM, bG, etc. They resemble the spoken ‘binary kilo-’, ‘binary mega-’, etc., so they might have some chance of adoption. Maybe programming languages should use them, so 1bk = 1024 and 1k = 1000. Or maybe they should unapologetically use 1k = 1024 — who needs powers of 10 anyway?

Some other, less controversial extensions to number syntax:

  • Allow flonums to omit zeroes before or after the decimal point: 1. and .1 should be acceptable as floats. (In Common Lisp, 1. is an integer, for historical reasons.)
  • Separators, as in Ada and Ruby and rather a lot of languages nowadays: 109 can be written as 1_000_000_000. (Or maybe as 1,000,000,000, but I'd rather avoid comma, to avoid confusion over whether a string like 1,234.5 is 1234.5 or 1.2345.) This is especially useful in REPL output, so you don't have to read unpunctuated ten-digit numbers. (Usability hint: don't print separators for four- or five-digit numbers; they're not needed there.)
  • Exponents in integers: 1e9 should read as the integer 1000000000, not as a float. (This is a good substitute for decimal SI suffixes.) Negative suffixes could read as ratios if available, or else as flonums.
  • Fractional SI suffixes: if 1k is 1024 or 1000, might 1m be 1/1000 (or 1/1024)?
  • Radix with ‘r’: 2r1101011 instead of #2r1101011, saving a dispatching character. (There is virtually no demand for radices over 16, so don't bother supporting them. If you want base 36, you'd probably like base 64 even better.)
  • Scheme-style complex numbers (-2+3i, 3.033-198.26i) may be easier to read than CL-style ones, although this seldom matters.

I suppose none of this is very important.

Chaining comparisons

Mathematical notation often uses more than one comparison operator in a row:

0 ≤ x < n

Anyone who tries that in C is in for a surprise (and possibly some frustrating debugging, since this sort of bug is easy to not see), but some languages do support it. I know seven ways to do it:

  • N-ary comparison: In most Lisps, comparison operators can take more than two arguments. This doesn't solve the whole problem, since it only covers chains that use two calls to one operator, or that can easily be rewritten to use one operator (e.g. 0 ≤ x < n as 0 ≤ x ≤ n - 1), but it helps, and it doesn't complicate the kernel. It doesn't work well with infix, though.
  • Combinator: Something like chain a f b g c = f a b && g b c enables writing (chain 0 ≤ x < n), if you don't mind the distraction of an explicit combinator call. In an infix language where it's awkward to pass comparison operators as arguments, you may be able to improve on this by making chain a macro.
  • Macros: Comparison operators can be macros which examine their arguments to see if they're calls to other comparison operators, and if so, expand to chain them. This requires a language with macros and infix (and in which booleans aren't possible arguments to comparisons), which explains why I don't know of any language that does it. In such a language, though, it's an easy way to provide fullly general chaining in user code, without kernel support, at the price of turning all the comparison operators into macros. If the language has a feature that rewrites function calls, like a more reliable version of Common Lisp's compiler-macros, then that price can be avoided.
  • Multiple infix: Some languages (e.g. Smalltalk, although it doesn't cover this case) support parsing 0 ≤ x < n as a call to a single ≤< operator with three arguments. In such languages, you can define that operator to be equivalent to (0 ≤ x && x < n). This doesn't scale well to more operators than the basic six, but if you already have multiple infix, it's easy to cover all the common cases of chaining.
  • Associativity option: If you have user-defined infix operators with a choice of left, right, or no associativity, you can add chaining as a fourth associativity option. This is what Perl 6 does.
  • Separate channels: In Icon, comparison operators give their Boolean result by success or failure, leaving the return value available for something else. So they return their right argument, and (0 ≤ x) < n works. This requires unusual language semantics, though.
  • Special case: Python simply has a special case for this, just as it has other special cases to make Boolean expressions look more familiar (e.g. is not).

The language-design pattern behind chaining is to redefine an implausible construct (in this case, comparing the result of a comparison) to express some more useful meaning instead. Usually the implausible construct is an error case — many language features assign meaning to otherwise meaningless operations, and this isn't hard to get right, because there's no question about whether the construct really was implausible.

It's much harder to apply in cases like chaining, where the implausible construct is not actually invalid: it's hard to tell how unlikely it is, and if it appears anyway (as e.g. a == (b < c) occasionally does), the semantic irregularity hurts. Irregularity is a problem even when redefining error cases, because it can produce incoherent semantics, like + overloaded for arithmetic and catenation. So this is not an easy pattern to follow, but it's one that has produced useful results.

Sinful total division

You know how annoyingly often you want to display a ratio — a percentage or framerate or some such — and you don't care what happens if you divide by zero, because the result isn't meaningful then anyway? But you don't want it to crash, so you end up writing things like this, over and over...

printf("did %d things (%d%% hard) at %d/s\n",
       n, n ? nhard * 100 / n : 0, dt ? n / dt : 0)

If you're writing it over and over, it should be a function, of course:

div0 a b = if (zero? b) 0 (a / b)

This is a sinful function: it doesn't help you do the Right Thing, only helps you be sloppy more conveniently. Programmers tend to feel guilty about writing such functions, and especially about including them in public interfaces, lest they encourage someone else to be sloppy. Often that guilt deters us from writing them at all, even when we plan to be sloppy anyway, so there would be no harm in it. Language designers avoid such features even more scrupulously, and lament the ones they're forced to include for backward compatibility. Even if div0 is frequently useful, you won't see it in many languages' standard libraries1.

On the other hand, a large class of language features are valuable because they support seemingly sloppy practices. Garbage collection lets you leak memory with impunity, dynamic redefinition lets you modify a program while it's running, automated tests let you use code you can't prove correct — so rejecting features that promote sloppiness is not a very good heuristic. I suspect a better understanding is possible here, but I don't have it yet.

1 Although there probably is some language which lacks a good way to report exceptions, and therefore has div0 as the standard division operator.

Lessons from Fortress

Guy Steele says the Fortress project is “wrapping up”, apparently because further implementation work is not expected to be very enlightening. I suppose this is a good time to get round to blogging about it...

Alan Eliasen asks for commentary and lessons learned. Here are some things I've learned from Fortress:

  • If your language includes a flashy experimental syntax, that will dominate perceptions of it, even by knowledgeable and interested people. Fortress includes a lot of potentially useful work about compiling a flexible dynamically-typed language efficiently, but who notices?
  • If you describe generic functions as dynamically resolved overloading, they no longer look weird to O-O programmers. Indeed, they hardly look like a feature at all; they look like the natural way overloading should work. You can even get away with adding them to a non-Lisp language!
  • There's surprisingly little connection between language features and target audience. If the Fortress documentation didn't say it was intended as a replacement for Fortran, I would never have guessed. The mathematical syntax and many numeric types hint at this, but not strongly enough to overcome the dynamic typing and generic functions.
  • Batch compilation is a reasonable thing. Dynamism is traditionally associated with other forms of power and flexibility, but it's OK to do without it (or offer a batch mode that disables it) for the sake of optimization.
  • It's nice to clearly document what features of your language are new or unusual or special, for the convenience of readers who are interested in it as research. Otherwise they may overlook your new ideas among the rest, especially if the language is a complex one like Fortress.
  • Features that express something nonlocally can also have local versions. IIRC, old versions of Fortress had a dispatch form that had the same semantics as method dispatch, but in a typecase-like form without the nonlocality of method definition. (It doesn't seem to exist in the current version, though. Maybe I'm hallucinating and it never existed.)
  • The stated purpose of a language affects how interesting I think it is. Fortress is mostly a compilation research project, but it claims to be a replacement for Fortran, which makes it sound much less interesting than it really is.

If most of these lessons are about documentation and perception rather than about the language itself, that's because I've never used Fortress, only read its documentation and some of the blog posts (which, unfortunately, have vanished, along with most of the website, but may be restored someday).

current-jiffy is not usable portably

In addition to current-second, the R7RS draft adds a timer function, current-jiffy. Obviously the Theoretical Right Thing for a timer is a high-resolution version of the same linear timescale used for dates, which current-second can (in an ideal implementation) provide. But in some applications (especially the rather important one of profiling), the timer can be called very frequently, so it must avoid allocation, which means either returning a fixnum or modifying some sort of mutable timer object. This is the point of current-jiffy.1

The main use case is something like this:

;;; Evaluate the body. Return multiple values: the time taken in
;;; seconds, followed by whatever values the body returned.
;;; This returns the time in seconds, not jiffies, for human
;;; readability. It returns the time as a number, rather than
;;; printing it somewhere, so it can be used programmatically.
(define-syntax time
  (syntax-rules ()
    ((time forms ...)
     (let ((start (current-jiffy)))
       (let-values ((vals (begin forms ...)))
         (apply values (/ (mod (- (current-jiffy) start) jiffy-modulus)
                          (jiffies-per-second))
                       vals))))))
> (time (/ (factorial 100) (factorial 99)))
0.0124525432
100 ;hopefully

Presumably R7RS-large will include such a macro. (time? elapsed-time? profile? bench?) But in the current R7RS-small, users can't portably write it, or do anything else with jiffies reliably, because they can't deal with overflow.

High-precision fixnum timers overflow fast. A 32-bit microsecond clock, for instance, overflows every 71 minutes, and a 30-bit fixnum every 17 minutes. This is too frequent to ignore, so all subtraction and comparison of jiffies needs to be modulo the word size, so all programs using current-jiffy need to know jiffy-modulus. In theory they could determine it by experiment, but that takes a while and is imprecise, so in practice portable code can't use current-jiffy for anything but setting random seeds.

This could be fixed by adding jiffy-modulus (jiffy-limit? max-jiffies?) to the standard. But since current-second already covers timing fairly well, it might be better to leave current-jiffy to R7RS-large. Efficient timers are a useful feature, but not one that belongs in Small Scheme.

By the way, shouldn't jiffies-per-second be a constant, not a function? It's in the section called “Standard Procedures”, which does not describe a numeric constant, but it's better to change the title of the section (Standard Definitions? Standard Procedures and Constants?) than to change the language to fit the title.

1IIUC it originally had another point: to provide a linear timescale even if current-second didn't. But now that current-second is supposed to provide TAI, current-jiffy is no longer needed for this, and even if current-second only provides disguised POSIX time, its nonlinearities are infrequent enough for most applications to ignore.

The Scheme that specifies TAI is not the eternal Scheme

The R7RS draft fixes a longstanding hole by adding the simplest possible time function: current-second, which returns the time since 1970 in seconds.

Unfortunately, it specifies that the timescale used is TAI. This is overspecification. The advantage of TAI over POSIX time is linearity (POSIX time jumps backward at leap seconds, so it's ambiguous, and time arithmetic gives incorrect results), but R7RS acknowledges that most implementations will compute it by simply subtracting a constant from their system clock. In addition to not being correct TAI and being gratuitously incompatible with POSIX, the resulting timescale remains ambiguous and nonlinear, so it has no advantage over POSIX time. Why specify a feature when you know it won't work?

I agree that the POSIX timescale is messed up, but this is not something languages can fix. As long as time is provided by the operating system, the choice of timescale is up to the operating system.

Also, time standards have historically been replaced fairly often, so Scheme risks specifying an obsolete standard which no one else uses. It's safer to simply leave the timescale and epoch unspecified, and let current-second be the system clock, whatever it is.

Oft-omitted features should be optional

Language implementors struggle with specifications. Those writing to a spec generally want to follow it, but unlike the authors of the spec, they face problems of implementability. When a standard specifies a feature that is too difficult to implement, or that interferes with performance or other desiderata, some implementors will simply omit it, whatever the language standard says.

Scheme has two features that are particularly annoying to implementors, because they affect how procedure calls are compiled: call-with-current-continuation and tail call optimization. Many implementations, particularly those that aim to interoperate with another language, don't implement them. In JScheme and Kawa, continuations only work outward: call/cc is call/ec. Kawa optimizes local tail recursion, but not tail calls in general. Their implementors would have preferred to follow R5RS, but they decided it was not practical to do so — they felt performance and compatibility with Java were more important than strict compliance with the standard.

To reduce the burden on implementors, the Scheme Reports have long declared some features optional. Most of the numeric tower is optional, and various standard procedures have been marked as optional in some versions. This has been a great help to implementors, who have been able to implement Scheme without worrying about little-used features like complex arithmetic, and has probably contributed to the large number of implementations.

Full continuations and nonlocal tail-call optimization are also little-used features — despite their general utility, few programs rely on either. Implementors know this, which is why they're willing to omit them, even if the standard does not. Users do not appear to consider this omission a serious flaw; JScheme and Kawa are generally considered serious, full-featured implementations, not incomplete toys. In existing practice, call-with-current-continuation and tail call optimization are optional.

Oleg has already proposed making call/cc officially optional, and I agree. Implementors and users treat it as optional, so the standard should too.

In Oleg's proposal, both call/cc and dynamic-wind are optional, and can be omitted entirely. This is unnecessarily disruptive, as it breaks the many programs that use call/cc only for escape continuations and dynamic-wind only for unwind-protect. I suggest a more conservative change: require call/cc to provide at least escape continuations, but make full reusable continuations optional. (There should be a cond-expand feature for this — full-continuations? full-call/cc? reusable-continuations? indefinite-extent-continuations? Just call-with-current-continuation?) This preserves compatibility for almost all programs, while removing one of the greatest burdens on implementors, so there will continue to be many good implementations of Scheme, even as it standardizes other implementation challenges like define-record-type and parameters and exceptions.

The same goes for other features that are difficult to implement. If neither implementors nor users think a feature is necessary, the language standard should acknowledge this, rather than condemn part of its community as noncompliant.

Kernel, library, and primitives

It's often useful to distinguish between parts of a language based on whether they can be expressed in user code:

  • The kernel is the part the compiler has to know about.
  • The library is the part that can be implemented in user code.
  • Primitives are the parts that are provided by the implementation, but could be expressed in user code.

This is an obvious and possibly familiar distinction, but it's underused. It should be part of the standard vocabulary for talking about languages.

Some examples in Scheme:

  • let is library, because it can be implemented as a macro. (It could be implemented as a macro in Common Lisp too, but for historical reasons it's in the kernel.)
  • procedure? is primitive, because while it could be implemented in user code, there's nothing it could be implemented in terms of. In a language with type-of, it's library.
  • The Form Without A Name (the function call operator) is kernel, because there's no way to define it, or anything like it, in user code. Even in Racket, where #%app can be redefined, there is no way for user code to define such a construct.
  • lambda, named-lambda and case-lambda can express each other as macros. At least one of them needs to be predefined, but the choice is arbitrary. Since the compiler needs to know about that one, it's kernel and the other two are library.
  • Full call-with-current-continuation is kernel, not just a primitive, because it can't be implemented without changing how function calls are compiled. Not really; see comments.
  • cons is a primitive in R5RS, because there's no way to define new datatypes. (Defining pairs as tagged vectors doesn't count, even if you redefine all vector operations to reject them, because existing code will still see them as vectors.) In R6RS, it can be library, because it could be defined with define-record-type. (However, it's often still primitive, for efficiency or for historical reasons.)
  • syntax-rules is mostly library, but requires some primitive support. define-syntax is kernel, but in a language with a hook to define new code-transformation passes in user code (kind of like Perl's source filters), it could be library.
  • The distinction between kernel and library has nothing to do with what names are standard or in the default namespace. map is standard and predefined but is pure library; if define-values set! is provided in a separate module, it's still kernel, because it requires compiler support.
  • Garbage collection is kernel, because it requires compiler (and runtime) support. Not really; see comments.

Some things to say using these terms:

  • Library has to be implemented in terms of something, so primitives are necessary. There are often additional primitives for efficiency (especially in slow implementations), but most primitives add functionality though would be otherwise unavailable.
  • When two constructs can express each other, either one could be primitive; the choice is often arbitrary.
  • A foreign function interface can be seen as an endless source of primitives. This is one of the reasons FFIs are so important for new languages: they compensate for a shortage of primitives.
  • Library, unlike the other two parts, is potentially portable between implementations. If a language provides a good reference implementation of its library (as most SRFIs do), this makes it much easier to implement.
  • Complexity in the kernel is much more expensive than in primitives or library, for both implementors (because it complicates the compiler) and users (because they can't understand the kernel according to the same rules as user code). Primitives, however, aren't necessarily more expensive than library.

Expressive features are particularly interesting when they allow the language to express things in library that would otherwise have to be in the kernel. Such features can often pay for their complexity by simplifying the kernel. For example:

  • In most languages, all abstract syntax (and concrete syntax too, for that matter) is kernel. In languages with macros, most abstract syntax can be library, leaving only a few special forms in the kernel. In a fexpr language, even the special forms are mere primitives, because the compiler doesn't have to know anything about them. (It might still know, for efficiency, but it doesn't have to.) Kernel has a very small kernel.
  • Exposing compiler data structures such as environments allows user code to do things otherwise possible only in the kernel. In Scheme, there is no programmatic way to create definitions, so define and define-values are kernel. In Common Lisp, or in a Scheme extended with a define! procedure that creates top-level definitions, top-level define and define-values can be implemented as macros.
  • Optimization is usually kernel, but Common Lisp and GHC can express some optimizations in user code. (This is a powerful and underappreciated capability.)
  • ML's typechecking is kernel, because there's no way for user code to define it. In a language which allows user-defined program analyses (are there any?), it could be library. (This feature would fit nicely with user-defined optimizations.)

Caveats

There are really two orthogonal distinctions here: kernel vs. library is about whether the compiler needs to know about the feature; primitive vs. derived is about how it was implemented. However, derived kernel is usually impossible (what, are you redefining eval?), and the distinction from primitive kernel is not very interesting, so I don't bother.

The distinction between kernel and library is often more important than that between library and primitives. In these cases it's convenient to lump primitives and library together.

Some features require runtime but not compiler support, like fixnums or weak pointers or unboxed numeric arrays. These form a class (“runtime features”?) between primitives and kernel. I didn't include this class in the list above because I think it's less important, and can usually be lumped in with kernel or primitives.

I'm inconsistent about whether “primitive” refers only to features that must be built in, or to all those that are built in to a particular implementation.

Maps vs. map

In functional languages, map traditionally refers to everyone's favourite1 high-order function. C++ and Java, on the other hand, follow a common mathematical usage: a “map” is a dictionary (an associative array, if anyone still uses that term). Clojure and Scala, being of mixed parentage, have both: their map is the high-order function, and their dictionaries also have names containing Map. This is only a little bit confusing, but both concepts are so important that they ought to have unambiguous names.

I usually prefer to reserve map for the function, and dismiss the use of the same name for dictionaries because it's less familiar and it doesn't make much sense. But map doesn't make much sense as a name for the function either. You can make excuses for it (it maps old elements of the list to new ones? it maps2 the function across the list?) but they're not very compelling; you can make equally good excuses for many other functions. The name is standard, but it's not a very good one.

Meanwhile, dictionaries are a common concept in need of a good name. “Dictionary” is four syllables, and “associative array” six (and misleading). “Hash” is too specific (and also a little misleading, since it ought to mean a hash function). But “map” is short, already somewhat well-known, and if not clear then at least not actively misleading. So maybe I should give up and call them maps, and find some other name for map-the-function.

Is this outrageous?

There's no obvious better name. image might be clear to mathematicians, but it's opaque to everyone else. elementwise is too long. each is comfortable, but suggests for-each (as in Ruby) or every. Repurposed prepositions like across or through are opaque, and tend to collide with natural-language uses (consider how annoying the C++/Java this is in speech). Symbolic names are even more opaque, not to mention unpronounceable. Perhaps it acquired the name map for lack of a good alternative.

It might be possible to avoid the issue by making the function unnecessary. If scalar operations are overloaded to operate elementwise on collections, as in APL, explicit map would be much less common, so its name would be less important. It wouldn't be rare enough to ignore, though; many of its calls use operations that aren't scalar-only, or at least aren't obviously so, so they would still be written with explicit map.

Another alternative is to merge it with some other function. If collections are seen as functions, map is almost equivalent to compose (especially in a lazy language). If compose on a collection does map, then there's no need for a separate map function, especially if compose has a short name like . or . However, I have some difficulty understanding f . xs as map f xs. Composition and elementwise application may nearly coincide in meaning, but they're conceptually quite different.

1Except possibly for compose.

2This use of “map” as a verb is presumably derived from the function, so it's a poor excuse.

Divides?

Here's another trivial convenience function:

divides? : int → int → boolean
divides? a b = mod b a == 0

This makes divisibility tests read more like the mathematical a | b notation:

prime? n = n ≥ 2 && not (some (2 .. isqrt n) (divides? _ n))

Argument order and name are debatable. (I find the ? suffix surprisingly uncomfortable, perhaps because divides? is a binary mathematical predicate, so I expect it to be a single-character infix operator like < ≤ ∈ ⊂, not an “ordinary” function.)

I initially wrote divides? off as a feature useful only in toy problems. It certainly comes up a lot in Project Euler problems, since those (as befits their name) are heavy on number theory. But even in real code, many (most?) uses of mod are divisibility tests. So divides? may be worth having, because it expresses this directly rather than via a slightly awkward use of mod.

(Real-world divisibility tests, incidentally, are often used to do something every nth iteration, e.g. updating a progress indicator, flushing a buffer, updating a cache. This pattern is so common that it might deserve an abstraction of its own. However, it's often a mistake — many periodic operations should be periodic in time, not number of iterations. So there's another useful operation: periodically interval timer.)

Pointer arithmetic can be safe

Advocates of memory-safe languages sometimes contrast them with C by saying that they don't have pointers, or (when someone points out how impractical they'd be if that were really true) that they don't have pointer arithmetic. This is supposed to make them safer. Because pointer arithmetic is unsafe, right?

Not necessarily. Pointer arithmetic in C happens to be unsafe, but this is not a problem with pointer arithmetic, only with C — or rather with its conventional implementation technique. C pointers are usually implemented as raw addresses, and pointer arithmetic as simple arithmetic on those addresses. The C standard, however, doesn't require this. It only requires pointer arithmetic (and comparisons) on pointers to the elements of an array (or one past the end), and it does not specify the behavior of pointer dereferences that don't point into the array. It doesn't require bounds checking, but it doesn't prohibit it either. So it's possible to make a conforming C implementation with bounds-checking on all pointer operations.

This has been done. Zeta-C (source here) was a C compiler for Lisp machines, which don't support unsafe array access at all. Scott Burson, the author, explains how it handled this:

All pointers were represented as pairs of an array and an index

Pointer arithmetic operated on the index, leaving the array intact. Pointer dereferences used the index with the ordinary, safe array operations, so all pointer dereferences were bounds-checked. Since Zeta-C also fixed C's other memory-safety problems (free did nothing, uninitialized variables were not garbage, and casts could not forge pointers), it was a memory-safe C compiler. This was part of its attraction — people didn't use it because they wanted to run C programs on Lisp machines, but because they wanted to debug their C programs in a safe implementation with the Lisp machine's excellent tools.

C programmers are well aware that memory unsafety is their biggest problem, and many other tools have been written to deal with it, but few of them recreate this feature. The technique is known to implementors (often by the name “fat pointers”) and is available as a patch for GCC. But it's not considered a standard part of the C debugging toolkit, even though it's easier to implement, and has a smaller performance cost, than commonly used tools like valgrind. I don't understand why. Wouldn't it be nice if your C compiler had a --safe mode which eliminated most memory safety bugs?

Update December 2013: The big problem with fat pointers is that they're incompatible with ABIs that use raw pointers, and virtually all interesting C programs make system or library calls through such an ABI. So a practical implementation of fat pointers needs to support raw pointers too, which adds complexity and greatly reduces the benefit.

Scott also tells a cute story about portability:

If you looked real closely, there were lots of little corners of C semantics where ZETA-C was not correct. In practice, however, one very rarely tripped over any of these.

For instance, I used Lisp integers for C int and long. This meant bignums would be created automatically, as usual in Lisp. Technically this is not a correct C implementation (even though I don't think the standard specifically says that the length of int and long shall be finite, one can take this as implied) but it very rarely ran into trouble. The only such case I remember, which was rather amusing, was a program that did something like

  int i;
  for (i = 1; i; i <<= 1) ...

(shifting a 1 bit left repeatedly, expecting it to fall off the left end of the word).

Who expects their C programs to run on a machine with infinite word size?

“Irritants”

I laughed when I first saw the arglist for R6RS's error:

(error who msg irritant1 ...)

There are irritants in the Scheme oyster, and error pearls them away for later reference. Cute!

“Irritant” is not a new term; it's existed in Scheme culture for a long time. The earliest reference I can find is in a Maclisp Scheme info file last modified in 1985, and it has turned up occasionally on the rrrs-authors mailing list. I haven't found it in any non-Scheme manuals (yet). Is it a Scheme-only term? Does anyone know where it originated?

For a while, the cuteness blinded me to a possible confusion: “irritants” suggests that the arguments should be the cause of the error — that they should be in some sense wrong or invalid. But they're only informative, and are often innocent indicators, not the causes of the irritation.

A format string and arguments is probably better, because it makes clearer messages. Or simply a string, in a language with string interpolation. Even though this doesn't call for a cute name.

Weak symbol tables in Common Lisp?

Christophe Rhodes asks, about Common Lisp:

something that has been suggested a few times is that packages should intern their symbols in a hash table which is weak: dear cleverweb, is there any way in which this can be detected? Think like pfdietz

Disappearing symbols can be detected by find-symbol...

CL-USER> 'foo
FOO
CL-USER> (setf * nil) ;get rid of the REPL's reference
NIL
CL-USER> (gc :full t)
NIL
CL-USER> (find-symbol "FOO")
NIL

...or by counting the symbols in a package (which might plausibly happen as part of a show-all-packages feature)...

CL-USER> 'foo
FOO
CL-USER> (loop for s being the symbols of "CL-USER" count t)
1265
CL-USER> (progn (setf ** nil) (gc :full t))
NIL
CL-USER> (loop for s being the symbols of "CL-USER" count t)
1264

...or by the apparent change in the home package of a symbol when a shadowing symbol disappears:

CL-USER> (defpackage "EXPORTER" (:export "FOO"))
#<PACKAGE "EXPORTER">
CL-USER> (defpackage "IMPORTER")
#<PACKAGE "IMPORTER">
CL-USER> (intern "FOO" "IMPORTER")
IMPORTER::FOO
NIL
CL-USER> (use-package "EXPORTER" "IMPORTER")
T
CL-USER> (intern "FOO" "IMPORTER")
IMPORTER::FOO
:INTERNAL
CL-USER> (progn (setf * nil) (gc :full t))
NIL
CL-USER> (intern "FOO" "IMPORTER")
EXPORTER:FOO
:INHERITED

I don't think any of these are really problems. It's already dangerous to attach any meaning to whether a symbol exists, since it might be unpredictably created by read. Unpredictable disappearances aren't any more confusing.

But instead of “thinking like pfdietz”, what about thinking like a user? There's a much more important problem here: symbols have definitions!

CL-USER> (defun foo ())
FOO
CL-USER> (progn (setf * nil) (gc :full t))
NIL
CL-USER> (foo)
Error: FOO has no fdefinition

CL symbols combine two functions: that of interned strings and that of top-level bindings. Top-level bindings are exceedingly important things, and the symbol table is often the only reference to them. So CL can't GC unreferenced symbols. :( And indeed, SBCL, Clisp and ABCL all hold their symbols strongly.

In languages with pure symbols, however, this is not a problem, so weak symbol tables are common (so much so that it never occurred to me that they might be impossible in CL!). They're present in Racket, OpenDylan, Clojure (for keywords) and even Java (for String.intern).

It might be safe to hold symbols weakly, even in CL, when they have no interesting properties: when they're not boundp or fboundp, have nothing on their plists, and aren't external. These are the symbols that, if GCed and interned again, would be recreated exactly as they were. (Keywords could be held weakly even if boundp, since intern would recreate that too.)

This would allow garbage collection of symbols which have only been read, but protect those that have been used for anything else. This could be implemented by keeping a separate weak hashtable for “plain” symbols, and moving them to the regular table when they become “interesting” (on set, (setf fdefinition), (setf get), etc.).

Is this a good idea? Is it worth the complexity? Is this all obvious and the real question is whether there's anything in the spec that forbids it?

When was ML invented?

Most sources say that ML, the language which introduced Hindley-Milner type inference, was invented in 1973. Its type system, however, was not described until 1977 or 1978. Milner's A Theory of Type Polymorphism in Programming says:

the polymorphic type discipline which we discuss here has been incorporated in the LCF metalanguage ML [2, 3], and has been in use for nearly 2 years. The compile-time type checker for this language has proved to be a valuable filter which traps a significant proportion of programming errors.

The paper was written in 1977 and revised in 1978, so “nearly 2 years” means 1975 or 1976. (References 2 and 3 are LCF and ML documentation, from 1977 and 1978; neither is on the net.) The 1972 LCF documentation (warning: bad partial OCR) doesn't mention the metalanguage, so I suspect the date of 1973 refers to when it was first added.

Without its distinctive type system, ML would have been an uninteresting ISWIM derivative (with dynamic type, or monomorphic static type?), hardly recognizable as the ancestor of all modern statically typed languages. So we should date it not from its first, rudimentary version, but from the introduction of its most important feature, circa 1975, or from its publication, in 1977.

Update August 2015: Uninteresting? What about exceptions and pattern-matching?

Ill-phrased slogans don't go right

“Well-typed programs don't go wrong.” This slogan for ML-style static typing is infamous, since it's true only under an extraordinarily narrow definition of “go wrong”. This definition is not used for any other purpose, so it's virtually impossible for an innocent listener to arrive at the “correct” interpretation. For propaganda purposes, this is a feature, as it allows the user to make an outrageous claim and then blame its falsity on the audience's ignorance.

The slogan is a little bit justifiable in its original context. Robin Milner introduced it (along with polymorphic type inference) in his 1978 paper A Theory of Type Polymorphism in Programming, as the heading “Well-Typed Expressions Do Not Go Wrong”. The original use refers to a toy language in which the only possible runtime errors are type errors, so the typechecker really does prove the absence of all errors. Well-typed expressions, in that language, can loop forever or run out of memory, but they can't have semantic errors: they can't “go wrong”.

Milner must have known it was misleading, though.

Square in R7RS

It's one of the most trivial library functions ever:

(define (square x) (* x x))

But I've written it many times, and I'm not alone. So it was a pleasant surprise to see that WG1 recently voted to add it to R7RS.

The request gave the (weak) reason that it could have an optimized method for bignums, but voter comments suggest they were more interested in symmetry with sqrt. They also added make-list for symmetry with make-vector. I don't think symmetry is a good reason in either case, let alone sufficient reason to add features to Small Scheme, but square is attractive as a convenience.

It's not the most important feature (and it probably belongs in Big Scheme, not Small Scheme), but it's a small step forward. Every language that aims to be convenient should have square in its standard library.

“However, the compiler arranges for it to work anyway.”

There are some things you don't want to hear from a language manual...

You might expect this not to work if it was compiled and res was not declared special, since non-special compiled variables are not represented as symbols. However, the compiler arranges for it to work anyway.

...especially not in the section on a low-level primitive like, say, pointers.

That's from page 110 of the 1979 Lisp Machine manual (20 MB of interesting history).

Unlike most lisps, Lisp Machine Lisp had pointers, called “locatives”: first-class places, implemented as (unboxed, IIUC!) pointers. One of their uses was to refer to variables by pointing to their names' value cells. But local variables generally don't live in their names' value cells, so locatives on them do nothing useful (and potentially something harmful). Apparently there was a workaround for this: the compiler recognized these locatives as intended to refer to local variables, and replaced them with something else.

Isn't it nice using a language with clean, well-specified semantics?

What do you mean by “regex”?

What's a regular expression?

The “official” definition is the literal one: an expression denoting a regular grammar. In this sense, a Parsec expression like many1 digit is a regex (equivalent to [0-9]+). But this is quite at odds with actual usage. No one calls Parsec expressions regexes when they happen to describe regular grammars! They're not regexes because the language isn't a regex language.

So is a regex an expression in a language that can only express regular grammars? No: Perl's regexes can call arbitrary Perl code, and thus can express non-regular grammars. But no one hesitates to call them regexes — they're the canonical implementation!

Like many Lisp-based regex libraries, CL-PPCRE offers a sexpr representation, which looks like (:greedy-repetition 1 nil :digit). It's rarely used because it's verbose, but if I saw one in the wild, I probably wouldn't call it a regex, because it doesn't have the distinctive syntax. CL-PPCRE's variant recursive-regex can express arbitrary recursive grammars, like Perl, but it still has “regex” in its name — not only as a joke, but because even when they're recursive, its expressions are still in regex syntax.

In common usage, regular expressions are defined by their syntax, not by the grammars they express.

Like any language, the regex language shares its name with its extensions: any very terse parsing language with the standard metacharacters is called a regex, regardless of what other features it has. Perl s/.../.../ expressions, for instance, are sometimes referred to as regexes, which is wrong both theoretically and in Perl terms, but right in terms of common usage. They contain regex-like syntax, so they're regexes, and the distinction between matching and substitution is not salient enough to earn them a different name. Even glob-expressions are sometimes called regexes — usually as a result of confusion, but it's an understandable confusion, because they share a metacharacter-rich syntax, and some of the metacharacters are even the same.

This focus on syntax sounds like a superficial trait distracting attention from the underlying ideas. It's really the other way around. Classification of grammars is seldom of much concern to anyone but parser theorists (and implementors, and sometimes users, of parsing libraries), but syntax is a constant source of problems for everyone who uses regexes. Consider JWZ's famous malediction (directed at himself or at emacs?):

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

The part of regexes that makes them a problem is the syntax: the metacharacters and consequent escaping, the sensitivity to single-character mistakes, the difficulty of extracting results (i.e. captures), the lack of a way to name and reuse parts of expressions. Not the regular grammar! So common usage makes exactly the right distinction. Syntax is what makes regular expressions; their (in)expressiveness is a superficial detail.

Return of the Lisp Machine package system features

Xach pointed out that Nikodemus Siivola recently added support for reading package::(arbitrary form) in SBCL.

I had heard that Zetalisp had this feature, but apparently it's older than that — this was how packages originally worked. The 1979 Lisp Machine manual says:

The colon character (":") has a special meaning to the Lisp reader. When the reader sees a colon preceded by the name of a package, it will read in the next Lisp object with package bound to that package.

I don't know why CL degeneralized the package prefix to only work on symbols. The only reason I've heard is that a misplaced colon could accidentally cause the next form to be read in the wrong package, but that doesn't sound more dangerous than other typos like stray parentheses.

Update August 2015: It's more dangerous because unlike most typos, it has a read-time side effect: it pollutes the other package with lots of unwanted symbols.

Old Lisp manuals are a fascinating mix of futuristic and primitive. Lisp Machine Lisp also had hierarchical packages: package names were symbols, not strings, and could therefore live in other packages. But there are no earmuffs on package, nor on any other special variables; apparently they hadn't been invented yet.