Arcane Sentiment: history

Showing posts with label history. Show all posts

Antedating “datatype” all the way to Plankalkül

Previously I speculated that the word “datatype” might have been used in computing before 1958. In response, dvt found a precedent from 1945! It's Konrad Zuse's very early language Plankalkül (Plan Calculus). Zuse's notes pervasively use the words Angabentyp and Angabenart, without bothering to define them. Modern German uses “Daten” instead of “Angaben”, but the terms are otherwise unchanged: “Datentyp” and “Datenart”.

Plankalkül was the world's first programming language, and it begins from first principles: the only primitive type is the bit, charmingly called a “Ja-Nein-Wert” (yes-no-value). It builds everything else out of arrays and tuples. The section on datatypes begins:

Angaben und ihre Darstellung [Data and its representation]

Die auftretenden Angaben können mannigfacher Art sein. Z.B. J.-N.-Werte, Zahlen, Listen usw. [The data given can be of various types, e.g. Y-N-values, numbers, lists etc.]
[...]
Die Unterscheidung der einzelnen Angabenarten soll nun wie folgt formalisiert werden [The distinction between the various datatypes will now be be formalized as follows]:
Angaben-Strukturen [Data structures]

Unter Struktur einer Angabe wird der komponentenmäßige Aufbau einer Angabe ohne Hinblick auf die Bedeutung der einzelnen Fälle und Komponenten verstanden. [The structure of a datum is the component composition of a datum without regard to the meaning of the individual instances and components.]
Wir haben Angaben von starrer und von variabler Struktur. Wir führen nun Angabenstrukturzeichen ein, welche jeder Angabe zugeordnet sind. Diese werden mit S und einer Kennzahl bezeichnet. Die Entwicklung der zusammengesetzten Strukturen erfolgt dann durch „Strukturgleichungen“ aus einfachen (bereits definierten) Strukturen. [We have data of fixed and of variable structure. We now introduce data structure symbols, which are assigned to each datum. These are denoted by S and an ID number. The development of composite structures then follows by “structure equations” from simple (already defined) structures.]
So wird dem einfachen Ja-Nein-Wert das Strukturzeichen S0 zugeordnet. Eine Folge von n J-N-Werten hat dann die Struktur S1.n. Es gilt die Strukturgleichung: [Thus the structure symbol S0 is assigned to the simple yes-no value. Then a sequence of n yes-no values has the structure S1.n. The structural equation applies:]
S1.n = n × S0
Durch Verfolgung der Strukturgleichungen ist es jederzeit möglich, den Aufbau einer Angabe zu ermitteln, auch wenn dieser sehr kompliziert ist. [By following the structure equations, it is possible at any time to determine the composition of a datum, even when it is very complex.]

Plankalkül was never implemented (well, not until 1975), but Zuse wrote enough code in it to discover the need for generics, and duly invented them:

Wir brauchen noch „unbestimmte“ Strukturzeichen. Wollen wir z.B. andeuten, daß eine Angabe aus einer Liste von n Gliedern besteht, ohne die Struktur des Gliedes im einzelnen festzulegen, so schreiben wir: n × σ. [We still need “undefined” structure symbols. Let us suppose, for example, that a datum consists of a list of n elements, without specifying the structure of the individual elements, so we write: n × σ.]
Für σ kann dann ein beliebiges Strukturzeichen eingesetzt werden. [For σ any structure symbol can be used.]
¤ × σ Ist das allgemeinste Strukturzeichen einer Liste. (Struktur der Glieder und Zahl der Glieder offen gelassen). Is the common structure symbol of a list. (Structure of elements and number of elements left open.)
¤ × 2σ Ist die Struktur einer Paarliste, bei der die Glieder der einzelnen Paare von gleicher Struktur σ sind. Is the structure of a pair-list where the elements of each pair are of the same structure σ.
¤ × (σ, τ) Ist die Struktur einer Paarliste bei der die Vorderglieder die Struktur σ, und die Hinterglieder die Struktur τ haben. Is the structure of a pair-list where the front elements have the structure σ and the back elements have the structure τ.
2 × n × σ Ist keine Paarliste, sondern ein Paar von Listen. Is not a pair-list, but a pair of lists.

Array indexes, incidentally, are zero-based:

Es sei noch darauf aufmerksam gemacht, daß bei einer aus n Gliedern bestehenden Angabe der höchste Index der Komponenten gleich n − 1 ist, da die Komponentennumerierung mit 0 beginnt. [It should be pointed out that for a datum consisting of n elements, the highest index of the components is equal to n − 1, as the component numbering begins with 0.]

Separately from data structures, Plankalkül supports constraints on which values can actually be used:

Eine Angaben-Beschränkung liegt vor, wenn die volle Variabilität der zu einer Angabenart gehörenden Struktur nicht voll ausgenutzt ist. Z.B. können Dezimalziffern durch 4 J.N.-Werte dargestellt werden. Es werden jedoch nur 10 von den 16 möglichen Variationen ausgenutzt. [A data-restriction is available when the full variability of the structure belonging to a datatype is not fully used. E.g. decimal digits can be represented by 4 bits. However, only 10 of the 16 possible variations are used.]
In solchen Fällen wird durch eine Beschränkungsformel angegeben, welche Fälle der Struktur in den Definitionsbereich der Angabenart fallen. Eine solche Formel wird mit B und einer Kennzahl bezeichnet. [In such cases, a restriction formula specifies which cases of the structure fall within the defined range of the datatype. Such a formula is denoted by B and an ID number.]

“Typ” and “Art” are synonyms, so they're ripe for distinction by anyone who wants words for two concepts. Zuse does: Angabentypen are optional annotations distinct from both structures and restrictions, while Angabenarten bundle all three together:

Angabentypen [Datatypes]

Den gleichen Strukturen und Beschränkungsformeln können Angaben verschiedener Bedeutung zugeordnet sein. (Z.B. x = und y = Koordinaten). Im allgemeinen ist es nicht nötig, diese zu unterscheiden. Ist dies jedoch vorteilhaft, so werden Typenbezeichnungen eingeführt. Z.B. T₁, T₇ usw. [The same structures and restriction-formulas can be assigned to data of different meaning. (E.g. x = and y = coordinates). In general it is not necessary to distinguish them. If it is advantageous, however, type-designations will be introduced. E.g. T₁, T₇ etc.]
Angabenart [Datatype]

Jeder Angabenart ist eine Struktur und evtl. eine Beschränkung bzw. eine Typenbezeichnung zugeordnet. Darüber hinaus kann eine Angabenart noch durch spezielle Bedeutungen der Komponenten gekennzeichnet sein. (Z.B. Zahlen in halblogarithmischer Form, vergl. Zahlenrechnungen S. 119 ff). [Each datatype is assigned a structure and possibly a restriction or type-designation. In addition, a datatype can be further characterized by specific meanings of the components. (E.g. numbers in semi-logarithmic [=floating-point] form, see Numerical Calculations, p.119 ff.)]
Alle diese Kennzeichnungen können dann unter einem Angabenzeichen A zusammengefaßt werden. Ist eine Angabe durch ein A-Zeichen z.B. A10 gekennzeichnet, so ist die besondere Kennzeichnung der Struktur usw. nicht erforderlich, da diese in A10 mit enthalten ist. [All these identifiers can be combined under one data symbol A. If a datum is marked with an A-symbol, e.g. A10, the specific identifier of the structure etc. is not required, as it is included in A10.]
Angabenart-Zeichen können jedoch auch einer Gruppe analoger Angabenarten verschiedener Struktur zugeordnet sein. Z.B. können Zahlen durch verschiedene Strukturen (z.B. Dual-Zahlen, Dez.-Zahlen) dargestellt werden. Jedoch kann ein allgemeines Zeichen (z.B. A8 vergl. Zahlenrechnen S. 121) eingeführt werden, welches lediglich besagt, daß es sich um eine Zahl handelt, ohne ihre Struktur im einzelnen festzulegen. [Datatype symbols can, however, also be assigned to a group of analogous datatypes of different structures. E.g. numbers can be represented by various structures (e.g. binary numbers, decimal numbers). However, a generic symbol (e.g. see A8, Numerical Calculations, p.121) can be introduced which only says that it is a number, without specifying its structure in detail.]
Wir führen entsprechend σ ein unbestimmtes Angabenartzeichen α ein. [We introduce an undefined datatype symbol α corresponding to σ.]

With abstract types in 1945, Plankalkül's type system is ahead of its time. So is its support for predicate calculus, which is worth a post of its own. Less exotically, it has the basic features of languages a decade later: (one-armed) conditionals, loops, function calls, and the assignment statement (written left-to-right).

One feature of Plankalkül is conspicuously primitive. All of the symbols for data structures, restrictions, constants, variables, and so on are not named but numbered. It's like Intercal but 27 years earlier!

Zuse noticed that it was confusing to so many numbers with so many different meanings, and tried to distinguish them with a unique two-dimensional syntax:

Die Zeilendarstellung [The line format]

Um die zu einer Angabe gehörenden verschiedenen Kennzeichnungen, wie Variablen-Index, Komponentenangabe, Angabenart bzw. Struktur usw. übersichtlich darstellen zu können, werden diese einzelnen Kennzeichnungen je verschiedenen Zeilen einer Formel zugeordnet. [To be able to show the various identifiers belonging to a datum, such as variable index, component data, datatype or structure etc., these individual identifiers are assigned to different lines of a formula.]
Wir haben zunächst die Hauptzeile, in welcher die Formel in der bisher üblichen Art dargestellt wird. [First we have the main line in which the formula is shown in the usual way.]
Die nächste Zeile dient der Unterscheidung der verscheidenen Variablen, welche durch den „Variablen-Index“ erfolgt. (V ). Eine weitere Zeile dient der Kennzeichnung der Komponenten der durch die Zeile 1 und 2 gekennzeichneten Variablen. (Komponentenindex K.) [The next line serves to distinguish the different variables, which is done by the “variable index” (V). Another line serves to identify the components of the variables indicated by lines 1 and 2. (Component index K.)]
Es wird also z.B. der Ausdruck [Thus e.g. the expression]
K1(V₃) Komponente 1 von V₃ [Component 1 of V₃]
wie folgt geschrieben [is written as follows]:
V
3
1
bzw. [or] K2.3(Z₄) =
Z
4
2.3

In modern notation, those are V₃[1] and Z₄[2, 3].

Weitere Zeilen können der Kennzeichnung der Struktur und Angabenart bzw. der Beschränkung und dem Typ dienen. [Further lines may be used to indicate the structure and type of data, or the restriction and the type.]
Im allgemeinen wird entweder die Angabe der Struktur oder der Angabenart genügen. (S = Index bzw. A = Index) [In general either the specification of the structure or of the datatype suffice. (S-index or A-index.)]
z.B. [e.g.]
Z
4
2.3
0
bedeutet: „Z4, Komponente 2.3”. Der Wert ist von der Struktur S0. [means: “Z4, component 2.3”. The value is of the structure S0.]
Die Strukturangabe bzw. Angabenart – Angabe bezieht sich dabei auf die Komponente. [The structure specification or datatype specification refers to the component.]
Die einzelnen Zeilen werden durch Vorsetzen der Buchstaben V, K, S bzw. A vor die Zeilen der Formel gekennzeichnet: [The individual lines are identified by prefixing the letters V, K, S or A before the lines of the formula:]
  | Z ^ Z
V | 4   2
K | 2.3
S | 0   0
Wird von einer Angabe keine Komponente gebildet, so bleibt der Komponenten-index frei. [If no component is established for a datum, the component index remains empty.]
Das Zeichen A kann stets an Stelle des Zeichens S gesetzt werden; aber im allgemeinen nicht umgekehrt. Die für Strukturen bereits definierten Kennzahlen dürfen dann nicht mehr für Angabenarten benutzt werden: (Z.B. gibt es nur eine Struktur S0, S1.n und die Zeichen A0, A1.n sind mit diesen Strukturzeichen identisch.) [The symbol A can always be used in place of S, but in general not vice versa. The ID numbers already defined for structures can thus no longer be used for datatypes: (E.g. there is only one structure S0, S1.n and the symbols A0, A1.n are identical to these structure symbols.]

If only Zuse had thought of giving them names! But he was trying to solve a different problem, of typography:

Mit Hilfe dieser Darstellung ist es leicht möglich, die einzelnen Angabenarten zu unterscheiden. Es ist nicht mehr wie bisher in der Mathematik nötig, verschiedene Zeichenarten für verschiedene Angabenarten heranzuziehen. (Z.B. deutsche Buchstaben für Vektoren.) Ein solches Verfahren wäre im allgemeinen Plankalkül nicht anwendbar, da die Zahl der verschiedenen Angabenarten innerhalb der gleichen Rechenpläne bzw. Plangruppen derartig mannigfaltig sein kann, daß die zur Verfügung stehenden Zeichenarten nicht ausreichen. [With the help of this representation it is easily possible to distinguish the individual datatypes. It is no longer necessary, as hitherto in mathematics, to draw up different types of symbols for different datatypes. (E.g. German letters for vectors.) Such a method would not be practical for general plan calculus, as the number of different datatypes in one program or program-group can be so many that the available types of symbols are not enough.]
Constanten [Constants]

Den einzelnen Angabenarten, Typen bzw. Strukturen können Constanten zugeordnet werden, denen spezielle Bedeutung zukommt. Eine Constante ist ein bestimmter Fall aus der Menge der möglichen Variationen einer Angabenart bzw. Struktur. Sie werden mit C und einer Kennzahl bezeichnet. [To the individual datatypes, types or structures constants can be assigned which have special significance. A constant is a particular case from the set of possible variations of a datatype or structure. They are denoted by C and an ID number.]

In addition to constants, Plankalkül distinguishes three kinds of variables (input, intermediate, and output). Since all four can be used in the same context, the symbols C, V, Z and R must appear on every variable reference to distinguish them, so the two-dimensional syntax is not helping much. It's also difficult to transcribe, so I'll stop here rather that trying to translate all 180 pages.

I don't know if Plankalkül was known to the designers of later programming languages, or if it had any influence. But its casual usage of the words “Angabentyp” and “Angabenart” suggests they were already established in 1945.

A brief history of “type”

The word “type” has a variety of meanings in programming languages, which are often a focus of confusion and contention. Here's a history of its use, focusing on particularly influential languages and papers.

1956: Fortran “modes”

The term “type” was apparently not yet established in 1956, because the Fortran manual speaks of integer and floating-point “modes” instead. It has something called “statement types”, but those are what are now called syntactic forms: assignment, conditional, do-loop, etc.

The 1963 Fortran II manual speaks of "two types of constants" (integer and floating-point), but this seems to be just the English word. When it talks about these types in more detail, it calls them “modes”, e.g. “arguments presented by the CALL statement must agree in number, order, mode, and array size with the corresponding arguments in the SUBROUTINE statement”. (Evidently the terms “formal” and “actual” parameters weren't established yet either.)

1958-63: Algol

Algol is one of the most influential languages in history. It introduced if ... then ... else, the int n declaration syntax, and semicolons. It also popularized the term “type”. The Algol 58 report defines type declarations on variables in terms of the “type” and “class” of values:

Type declarations serve to declare certain variables, or functions, to represent quantities of a given class, such as the class of integers or class of Boolean values. [...] Throughout the program, the variables, or functions named by the identifiers I, are constrained to refer only to quantities of the type indicated by the declarator.

The Algol 60 report is more consistent:

The various “types” (integer, real, Boolean) basically denote properties of values. The types associated with syntactic units refer to the values of these units.

Note that types are explicitly a property of values, not variables or expressions. But does “basically” mean someone thought otherwise, or just that this isn't a formal definition?

1967: Strachey's Fundamental Concepts

Chris Strachey's Fundamental Concepts in Programming Languages was an influential set of lecture notes that established a bunch of common terms. It defines types thus:

Most programming languages deal with more than one sort of object—for example with integers and floating point numbers and labels and procedures. We shall call each of these a different type and spend a little time examining the concept of type and trying to clarify it.

Strachey takes it for granted that types can be static or dynamic, and prefers static typing only for reasons of efficiency (which was, after all, of overwhelming importance in 1967):

It is natural to ask whether type is an attribute of an L-value or of an R-value—of a location or of its content. The answer to this question turns out to be a matter of language design, and the choice affects the amount of work, which can be done when a program is compiled as opposed to that which must be postponed until it is run.

Strachey does not mention type theory, because no one had yet realized that it could be applied to programs. That changed in the next year.

1968: type theory

James Morris was the first to apply type theory to programming languages, in his 1968 Lambda-calculus models of programming languages. “A system of types and type declarations is developed for the lambda-calculus and its semantic assumptions are identified. The system is shown to be adequate in the sense that it permits a preprocessor to check formulae prior to evaluation to prevent type errors.”

He begins by explaining what types are and why they matter, using the term in the usual programming-languages sense:

In general, the type system of a programming language calls for a partitioning of the universe of values presumed for the language. Each subset of this partition is called a type.
From a purely formal viewpoint, types constitute something of a complication. One would feel freer with a system in which there was only one type of object. Certain subclasses of the universe may have distinctive properties, but that does not necessiate an a priori classification into types. If types have no official status in a programming language, the user need not bother with declarations or type checking. To be sure, he must know what sorts of objects he is talking about, but it is unlikely that their critical properties can be summarized by a simple type system (e.g., prime numbers, ordered lists of numbers, ages, dates, etc.).
Nevertheless, there are good, pragmatic reasons for including a type system in the specifications of a language. The basic fact is that people believe in types. A number is a different kind of thing from a pair of numbers; notwithstanding the fact that pairs can be represented by numbers. It is unlikely that we would be interested in the second component of 3 or the square root of < 2,5 >. Given such predispositions of human language users, it behooves the language designer to incorporate distinctions between types into his language. Doing so permits an implementer of the language to choose different representations for different types of objects, taking advantage of the limited contexts in which they will be used.
Even though a type system is presumably derived from the natural prejudices of a general user community, there is no guarantee that the tenets of the type system will be natural to individual programmers. Therefore it is important that the type restrictions be simple to explain and learn. Furthermore, it is helpful if the processors of the language detect and report on violations of the type restrictions in programs submitted to them. This activity is called type-checking.

Then he switches without explanation to taking about static checkers, e.g:

We shall now introduce a type system which, in effect, singles out a decidable subset of those wfes that are safe; i.e., cannot given rise to ERRORs. This will disqualify certain wfes which do not, in fact, cause ERRORS and thus reduce the expressive power of the language.

So the confusion between programming-language and type-theory senses of the word began with the very first paper to use the latter.

1968: APL

APL-360 was the most popular dialect of APL. Its manual doesn't use the word “type”; it speaks of “representations” of numbers. But it considers these an implementation detail, not an important part of its semantics.

APL has a lot of unique terminology — monad and dyad for unary and binary operators, adverb and conjunction for high-order operators, and so on — so it's not surprising that it has its own word for types too.

1970: Pascal

Wirth's 1970 definition of Pascal is, as usual, plain-spoken: “The type of a variable essentially defines the set of values that may be assumed by that variable.” (But there's that “essentially”, like Algol's “basically”.)

1970-73: Lisp belatedly adopts the term

Like Fortran, early Lisps used the word “type”, but only in its ordinary English sense, never as a technical term. AIM-19, from 1960 or 1961, speaks of “each type of LISP quantity”, but doesn't use “type” unqualified. Similarly, the 1962 Lisp 1.5 Manual uses the word for various purposes, but not as an unqualified term for datatypes. The most common use is for function types (subr vs. fsubr); there are “types of variables” (normal, special, common), but datatypes were not, apparently, considered important enough to talk about. They might not have even been seen as a single concept — there are awkward phrases like “bits in the tag which specify that it is a number and what type it is”, which would be simpler with a concept of datatypes.

This changed in the early 1970s. The 1967 AIM-116a and 1970 AIM-190 still don't use “type”, but the 1973 Maclisp manual and 1974 Moonual do, and it consistently means “data type”. Most tellingly, they have typep, so the term was solidly ensconced in the name of a fundamental operator.

1973: Types are not (just) sets

By 1973, the definition of types as sets of values was standard enough that James Morris wrote a paper arguing against it: “Types are not sets”. Well, not just sets. He was talking about static typechecking, and argued that checking for abstraction-safety is an important use of static typechecking. The abstract explains:

The title is not a statement of fact, of course, but an opinion about how language designers should think about types. There has been a natural tendency to look to mathematics for a consistent, precise notion of what types are. The point of view there is extensional: a type is a subset of the universe of values. While this approach may have served its purpose quite adequately in mathematics, defining programming language types in this way ignores some vital ideas. Some interesting developments following the extensional approach are the ALGOL-68 type system, Scott's theory, and Reynolds' system. While each of these lend valuable insight to programming languages, I feel they miss an important aspect of types. Rather than worry about what types are I shall focus on the role of type checking. Type checking seems to serve two distinct purposes: authentication and secrecy. Both are useful when a programmer undertakes to implement a class of abstract objects to be used by many other programmers. He usually proceeds by choosing a representation for the objects in terms of other objects and then writes the required operations to manipulate them.

1977: ML and modern static typing

ML acquired its type system in about 1975 and was published in 1977. Until this point, the application of type theory to programming languages had been theoretical, and therefore had little influence. ML made it practical, which has probably contributed a lot to the terminological confusion.

ML's theoretical support (along with the misleading slogan “well-typed expressions do not go wrong”) came out in the 1978 paper A Theory of Type Polymorphism in Programming, which despite being about type theory, speaks of types containing values:

Some values have many types, and some have no type at all. In fact “wrong” has no type. But if a functional value has a type, then as long as it is applied to the right kind (type) of argument it will produce the right kind (type) of result—which cannot be “wrong”!
Now we wish to be able to show that—roughly speaking—an Exp expression evaluates (in an appropriate environment) to a value which has a type, and so cannot be wrong. In fact, we can give a sufficient syntactic condition that an expression has this robust quality; the condition is just that the expression has a “well-typing” with respect to the environment, which means that we can assign types to it and all its subexpressions in a way which satisfies certain laws.

The short version

So here's the very brief history of “type” in programming languages:

It wasn't used at all until 1958.
Types as sets of values: Algol-58.
The type-theory sense: Morris 1968.

These may not be the earliest uses. I got most of the old manuals from Paul McJones' collection, which is a good place to look for more. I welcome antedatings.

I'm also curious about the term “datatype”, which might plausibly be ancestral to “type”. I could find no uses of it older than “type”, but I may be looking in the wrong field. Statistical data processing is much older than computing, and has dealt with datatypes for a long time. Might the terms “datatype” and “type” have originated there?

Update August 2015: Jamie Andrews said much the same seven months earlier.

Update June 2017: In HN comments, dvt found “datatype” in 1945, in Plankalkül.

Incorrect optimization in 1963

Floating-point users today are accustomed (or resigned, sometimes) to compilers that make invalid optimizations by assuming all arithmetic is mathematically correct instead of rounding. The situation used to be worse. A 1963 IBM Fortran II manual warns that it did this for integers too:

FORTRAN assumes that mathematically equivalent expressions are computationally equivalent. Hence, a sequence of consecutive multiplications, consecutive divisions, consecutive additions, or consecutive subtractions, not grouped by parentheses will be reordered, if necessary, to minimize the number of storage accesses in the object program.
Although the assumption concerning mathematical and computational equivalence is virtually true for floating point expressions, special care must be taken to indicate the order of fixed point multiplication and division, since fixed point arithmetic in FORTRAN is “greatest integer” arithmetic (i.e., truncated or remainderless). Thus, the expression
5*4/2
which by convention is taken to mean [(5 × 4)/2], is computed in a FORTRAN object program as
((5/2)*4
i.e., it is computed from left to right after permutation of the operands to minimize storage accesses.
The result of a FORTRAN computation in this case would be 8. On the other hand, the result of the expression (5 × 4)/2 is 10. Therefore, to insure accuracy of fixed point multiplication and division, it is suggested that parentheses be inserted into the expression involved.

(Reordering “to minimize the number of storage accesses” is pointless in a constant expression, but apparently the optimizer did it anyway.)

If this reordering can be prevented by redundant parentheses, then parentheses don't only affect parsing; they change semantics by introducing a barrier against algebraic transformations!

Giving parentheses this additional meaning has an unfortunate effect: other optimizations can no longer ignore them. The manual continues by describing one such problem:

One important type of optimization, involving common subexpressions, takes place only if the expression is suitably written. For example, the arithmetic statement
Y = A*B*C + SINF (A*B)
will cause the object program to compute the product A*B twice. An efficient object program would compute the product A*B only once. The statement is correctly written
Y = (A*B) * C + SINF (A*B)
By parenthesizing the common subexpression, A*B will be computed only once in the object program.
In general, when common subexpressions occur within a expression, they should be parenthesized.
There is one case in which it is not necessary to write the parentheses, because FORTRAN will assume them to be present. These are the type discussed in “Hierarchy of operations,” and need not be given. Thus
Y = A*B+C+SINF (A*B)
is, for optimization purposes, as suitable as
Y = (A*B)+C+SINF (A*B)

I'm not sure whether the problem is simply that A*B*C does not contain the subexpression A*B, or that the CSE lifter sees it but can't merge it with (A*B) because they're not equivalent in all contexts.

Optimizers today still have limitations, and still make invalid transformations, but they've become much more subtle!

What happened to “manifest” and “latent”?

Chris Strachey has remarkably influential lecture notes. His 1967 Fundamental Concepts in Programming Languages introduced or popularized a lot of now-standard terminology: r-value and l-value, first-class, polymorphism (ad-hoc and parametric), and maybe parametric type.

It also introduced some terms which didn't catch on, among them manifest and latent:

We call attributes which can be determined at compile time in this way manifest; attributes that can only be determined by running the program are known as latent.

These are the concepts now called “static” and “dynamic”. I'm not sure why Strachey bothered to introduce his own words for them, since the standard ones already existed, and he was evidently more comfortable with them — when he discusses types on the same page, he consistently uses “dynamic”, not “latent”. (Was “dynamic typing” already a standard term by 1967?) Maybe he reserved “static” and “dynamic” for behaviour, and wanted different words for the time when a property could be determined.

He acknowledges that the boundary between static and dynamic is fuzzy, and explains why it's useful anyway:

The distinction between manifest and latent properties is not very clear cut and depends to a certain extent on questions of taste. Do we, for example, take the value of 2 + 3 to be manifest or latent? There may well be a useful and precise definition—on the other hand there may not. In either case at present we are less interested in the demarkation problem than in properties which are clearly on one side or other of the boundary.

I wish more academics dared to do that.

Neither “manifest” nor “latent” caught on, and they might have been forgotten like most new coinages — but decades later, both have been resurrected with new meanings in connection with type. “Manifest typing” now refers to languages that require type declarations — an important concept that lacked a short name. “Manifest” is readily reinterpretable as “appearing in source”, and while it might confuse people who remember the old sense, we are few. Less usefully, “latent typing” serves as a euphemism for “dynamic typing” among type-theory partisans (bizarrely, as the word they object to is “type”, not “dynamic”, but at least it avoids using the terminology of the savages). In neither case does Strachey's original meaning survive; if you speak of some property other than type as “manifest” or “latent”, most proglang researchers will not understand.

Where do closures come from?

Common Lisp's function form is usually described as a device for switching between namespaces: it evaluates its argument in the “function” namespace instead of the normal “variable” namespace.

Older sources have a completely different idea: they say function makes closures. The Hyperspec says:

If name is a lambda expression, then a lexical closure is returned.

and

function creates a closure of the lambda expression

Both of these lines were inherited from CLtL, so this is not a new interpretation, nor one incompatible with the best of knowledge. What's going on?

To begin with, these two interpretations of function aren't observably different in portable Common Lisp. The only portable way to get a closure is by (function (lambda ...)) or by macros like defun that might expand to it. ((lambda ...) expands to (function (lambda ...)), because unlike all other special forms, lambda is in the function namespace, but that's just a historical quirk.) The only way to use lambda without function is ((lambda ...) ...), which has the same semantics regardless of whether it makes a closure. So portable code can't tell the difference.

Implementation-specific extensions can. If compile is extended to non-null lexical environments, it will make closures out of lambda-expressions without any help from function. Or if there's a named-lambda form that makes closures, it's unnecessarily complex to attribute the closure in (function (lambda ...)) to function.

So Common Lisp culture favors the simpler interpretation: lambda makes closures, and function is a mere namespacing operator.

Like so many oddities of CL, the old interpretation comes from Lisp Machine Lisp. The 1984 Lisp Machine Manual introduces function by saying it “has two distinct, though related, meanings.” The first is to get a symbol's function definition, and the second is to make a closure:

(let (a)
  (mapcar (function (lambda (x) (push x a))) l))
passes mapcar a specially designed closure made from the function represented by (lambda (x) (push x a)). When mapcar calls this closure, the lexical environment of the function form is put again into effect, and the a in (push x a) refers properly to the binding made by this let.

These two meanings were reflected in implementations. Guy Steele's reference interpreter (in the CL mailing list archive) doesn't bother to make a closure for ((lambda ...) ...), only for (function (lambda ...)). But when optimizing compilers became the norm, it no longer seemed silly (or inefficient) for lambda to always make a closure, so reinterpreting function as a namespacing operator made sense.

Surprisingly, this is not the first time function has been reinterpreted. The Pitmanual says Maclisp's function didn't make closures — it took a different form, *function, to even partially do that. function was equivalent to quote, except that in compiled code it would make a compiled function instead of just a lambda-expression — it permitted compilation but didn't change scoping. When Lisp Machine Lisp changed it to make closures, that was largely backward compatible, since most lambdas were intended to use lexical scope anyway. (I'm not sure when compilers started to use lexical scope — was that in Maclisp?)

I don't think any other language construct has had so many unrelated meanings over the years, let alone done so while preserving the meaning of existing code. function was originally a hint to the compiler, then a way to make closures, and then a namespacing operator. Its history probably ends there, since most new lisps eschew multiple namespaces and omit function rather than repurpose it, but three unrelated meanings is impressive.

“There need be no real danger of it ever becoming a drudge”

Jonathan Edwards quotes a beautiful piece of naive optimism from Turing:

The process of constructing instruction tables should be very fascinating. There need be no real danger of it ever becoming a drudge, for any processes that are quite mechanical may be turned over to the machine itself.
- Turing, A. M., 1946, Proposed electronic calculator, report for National Physical Laboratory, Teddington

Admittedly Turing was speculating, and he's sort of right; programming is more amenable to automation than other things humans do. But he must have been disappointed to discover how difficult and slow the process of turning processes over to the machine is.

It's tempting to believe there's a sort of incompleteness here: as the ability of formal systems to express propositions always grows faster than their ability to prove them (more optimistically, their ability to ask questions always grows faster than their ability to answer them), does the ability of languages to generate repetitive tasks always grow faster than their ability to automate the repetition?

Wrt expressiveness, no: programming languages don't necessarily face an incompleteness problem because they don't need to be consistent: they need only express programs, not completely avoid expressing errors. (And even if they did, mere incompleteness does not imply drudgery; if programming generated endless novel problems, it would hardly be a drudge.) But most of the repetition in programming is about understanding and transforming code. This does need to be consistent, so increased expressive power doesn't necessarily help. Any increase in the space of programs that can be expressed is an increase in the space of programs that must be excluded to understand a program — and it's typically undecidable which features affect which code, so this can't always be done by machine. QED?

I'm not convinced by this argument. Incompleteness and undecidability are familiar reasons for things to be impossible, so they make good excuses, but I doubt they have anything to do with the existence of drudgery in programming. Most of the repetitive analyses we do are amenable to automation — if not in the general case, then at least in most common cases. But we don't try very hard. We write tools for many purposes, but program understanding has not traditionally been a popular one; we hardly use tools for it unless they're imposed on us by being built into a language. Is the real problem here that we don't mind the challenging drudgery of understanding?

Pointer arithmetic can be safe

Advocates of memory-safe languages sometimes contrast them with C by saying that they don't have pointers, or (when someone points out how impractical they'd be if that were really true) that they don't have pointer arithmetic. This is supposed to make them safer. Because pointer arithmetic is unsafe, right?

Not necessarily. Pointer arithmetic in C happens to be unsafe, but this is not a problem with pointer arithmetic, only with C — or rather with its conventional implementation technique. C pointers are usually implemented as raw addresses, and pointer arithmetic as simple arithmetic on those addresses. The C standard, however, doesn't require this. It only requires pointer arithmetic (and comparisons) on pointers to the elements of an array (or one past the end), and it does not specify the behavior of pointer dereferences that don't point into the array. It doesn't require bounds checking, but it doesn't prohibit it either. So it's possible to make a conforming C implementation with bounds-checking on all pointer operations.

This has been done. Zeta-C (source here) was a C compiler for Lisp machines, which don't support unsafe array access at all. Scott Burson, the author, explains how it handled this:

All pointers were represented as pairs of an array and an index

Pointer arithmetic operated on the index, leaving the array intact. Pointer dereferences used the index with the ordinary, safe array operations, so all pointer dereferences were bounds-checked. Since Zeta-C also fixed C's other memory-safety problems (free did nothing, uninitialized variables were not garbage, and casts could not forge pointers), it was a memory-safe C compiler. This was part of its attraction — people didn't use it because they wanted to run C programs on Lisp machines, but because they wanted to debug their C programs in a safe implementation with the Lisp machine's excellent tools.

C programmers are well aware that memory unsafety is their biggest problem, and many other tools have been written to deal with it, but few of them recreate this feature. The technique is known to implementors (often by the name “fat pointers”) and is available as a patch for GCC. But it's not considered a standard part of the C debugging toolkit, even though it's easier to implement, and has a smaller performance cost, than commonly used tools like valgrind. I don't understand why. Wouldn't it be nice if your C compiler had a --safe mode which eliminated most memory safety bugs?

Update December 2013: The big problem with fat pointers is that they're incompatible with ABIs that use raw pointers, and virtually all interesting C programs make system or library calls through such an ABI. So a practical implementation of fat pointers needs to support raw pointers too, which adds complexity and greatly reduces the benefit.

Scott also tells a cute story about portability:

If you looked real closely, there were lots of little corners of C semantics where ZETA-C was not correct. In practice, however, one very rarely tripped over any of these.
For instance, I used Lisp integers for C int and long. This meant bignums would be created automatically, as usual in Lisp. Technically this is not a correct C implementation (even though I don't think the standard specifically says that the length of int and long shall be finite, one can take this as implied) but it very rarely ran into trouble. The only such case I remember, which was rather amusing, was a program that did something like
  int i;
  for (i = 1; i; i <<= 1) ...
(shifting a 1 bit left repeatedly, expecting it to fall off the left end of the word).

Who expects their C programs to run on a machine with infinite word size?

“Irritants”

I laughed when I first saw the arglist for R6RS's error:

(error who msg irritant₁ ...)

There are irritants in the Scheme oyster, and error pearls them away for later reference. Cute!

“Irritant” is not a new term; it's existed in Scheme culture for a long time. The earliest reference I can find is in a Maclisp Scheme info file last modified in 1985, and it has turned up occasionally on the rrrs-authors mailing list. I haven't found it in any non-Scheme manuals (yet). Is it a Scheme-only term? Does anyone know where it originated?

For a while, the cuteness blinded me to a possible confusion: “irritants” suggests that the arguments should be the cause of the error — that they should be in some sense wrong or invalid. But they're only informative, and are often innocent indicators, not the causes of the irritation.

A format string and arguments is probably better, because it makes clearer messages. Or simply a string, in a language with string interpolation. Even though this doesn't call for a cute name.

When was ML invented?

Most sources say that ML, the language which introduced Hindley-Milner type inference, was invented in 1973. Its type system, however, was not described until 1977 or 1978. Milner's A Theory of Type Polymorphism in Programming says:

the polymorphic type discipline which we discuss here has been incorporated in the LCF metalanguage ML [2, 3], and has been in use for nearly 2 years. The compile-time type checker for this language has proved to be a valuable filter which traps a significant proportion of programming errors.

The paper was written in 1977 and revised in 1978, so “nearly 2 years” means 1975 or 1976. (References 2 and 3 are LCF and ML documentation, from 1977 and 1978; neither is on the net.) The 1972 LCF documentation (warning: bad partial OCR) doesn't mention the metalanguage, so I suspect the date of 1973 refers to when it was first added.

Without its distinctive type system, ML would have been an uninteresting ISWIM derivative (with dynamic type, or monomorphic static type?), hardly recognizable as the ancestor of all modern statically typed languages. So we should date it not from its first, rudimentary version, but from the introduction of its most important feature, circa 1975, or from its publication, in 1977.

Update August 2015: Uninteresting? What about exceptions and pattern-matching?

Ill-phrased slogans don't go right

“Well-typed programs don't go wrong.” This slogan for ML-style static typing is infamous, since it's true only under an extraordinarily narrow definition of “go wrong”. This definition is not used for any other purpose, so it's virtually impossible for an innocent listener to arrive at the “correct” interpretation. For propaganda purposes, this is a feature, as it allows the user to make an outrageous claim and then blame its falsity on the audience's ignorance.

The slogan is a little bit justifiable in its original context. Robin Milner introduced it (along with polymorphic type inference) in his 1978 paper A Theory of Type Polymorphism in Programming, as the heading “Well-Typed Expressions Do Not Go Wrong”. The original use refers to a toy language in which the only possible runtime errors are type errors, so the typechecker really does prove the absence of all errors. Well-typed expressions, in that language, can loop forever or run out of memory, but they can't have semantic errors: they can't “go wrong”.

Milner must have known it was misleading, though.

“However, the compiler arranges for it to work anyway.”

There are some things you don't want to hear from a language manual...

You might expect this not to work if it was compiled and res was not declared special, since non-special compiled variables are not represented as symbols. However, the compiler arranges for it to work anyway.

...especially not in the section on a low-level primitive like, say, pointers.

That's from page 110 of the 1979 Lisp Machine manual (20 MB of interesting history).

Unlike most lisps, Lisp Machine Lisp had pointers, called “locatives”: first-class places, implemented as (unboxed, IIUC!) pointers. One of their uses was to refer to variables by pointing to their names' value cells. But local variables generally don't live in their names' value cells, so locatives on them do nothing useful (and potentially something harmful). Apparently there was a workaround for this: the compiler recognized these locatives as intended to refer to local variables, and replaced them with something else.

Isn't it nice using a language with clean, well-specified semantics?

Return of the Lisp Machine package system features

Xach pointed out that Nikodemus Siivola recently added support for reading package::(arbitrary form) in SBCL.

I had heard that Zetalisp had this feature, but apparently it's older than that — this was how packages originally worked. The 1979 Lisp Machine manual says:

The colon character (":") has a special meaning to the Lisp reader. When the reader sees a colon preceded by the name of a package, it will read in the next Lisp object with package bound to that package.

I don't know why CL degeneralized the package prefix to only work on symbols. The only reason I've heard is that a misplaced colon could accidentally cause the next form to be read in the wrong package, but that doesn't sound more dangerous than other typos like stray parentheses.

Update August 2015: It's more dangerous because unlike most typos, it has a read-time side effect: it pollutes the other package with lots of unwanted symbols.

Old Lisp manuals are a fascinating mix of futuristic and primitive. Lisp Machine Lisp also had hierarchical packages: package names were symbols, not strings, and could therefore live in other packages. But there are no earmuffs on package, nor on any other special variables; apparently they hadn't been invented yet.

How typep got its name

Paul McJones has a large collection of old Lisp manuals, which shed a lot of light on historical questions like how unary typep got its irregular name. The earliest reference I've found is in a 1973 Maclisp manual, which describes it thus, immediately after the other type predicates:

typep  SUBR 1 arg
  typep is a general type-predicate.  It returns an atomic symbol
  describing the type of its argument, chosen from the list
    (fixnum flonum bignum list symbol string random)
  symbol means atomic symbol.  Random is for all types that don't
  fit in any other category.

A “general type-predicate”? This wording suggests typep got its name because it was seen as a generalization of the regular type predicates. This makes sense: as Maclisp acquired an increasingly unwieldy number of type predicates, its maintainers would want to merge them into one operation, and would probably think of that operation as “a general type-predicate”, even though it's not actually a predicate.

(That random type is probably just an implementation detail showing through, but it's still adorable. Rudimentary type systems are to languages what tiny clumsy paws are to kittens.)

Japanese Lisp, Forth, and historical contingency

Yusuke Shinyama speculates: what would Lisp look like if it had been invented by speakers of a consistently postfix language like Japanese? Might it be written postfix instead of prefix?

Maybe so. But this is superficial. The car of a Lisp form is special because it's the head, not because it's first; as long as there are head and rest operators, it makes no difference whether the head is stored first or last or even in the middle. So while a postfix Lisp looks different, this is only a superficial matter of syntax; Lisp would work the same way if it had been invented in Japan.

Forth, though, might be dramatically affected — not in nature but in timing. Despite its simplicity, Forth appeared rather late: it was developed in the 1960s and not publicized until 1970, which was too late to become part of the cultural foundation of computing. I suspect this was an anomaly; Forth is so simple that it could easily have been discovered earlier, had anyone bothered to explore postfix notation. Speakers of a postfix natural language have an obvious example to encourage them. (Postfix mathematical notation would be an even stronger encouragement. IIUC Japanese arithmetic words are infix, so a Japanese-inspired notation would also be infix; postfix arithmetic could arise more naturally in a natlang where arithmetic operators are postpositions, and postpositional phrases follow their head noun, but this is not a common combination.)

If Forth had been known by the mid-1950s, it could have outcompeted Fortran to become the canonical high-level language. This would have exerted significant pressure on hardware: machines would be designed to run Forth, much as they're designed to run C today, so there would be a lot of stack machines. Since Forth makes such a good assembly language for such machines, there would be less pressure to develop other high-level languages. Programmers accustomed to its simplicity and flexibility and convenience would see all proposed alternatives as unusable and crippled and insanely complex, so other language families could go unexplored. Forth and its descendants might rule computing unchallenged for decades, until some long-delayed development made tree languages competitive.

History could be different. Lisp, Fortran, Algol — all the great old language families — might not exist, if the pioneers of computing had spoken a head-last natural language and found Forth first.

What's so cool about APL?

Why does APL have such a devoted following, despite its strange appearance? What have its users, since the 1960s, seen in it that made them embrace such an unpopular language?

I'm not one of those fanatical APLers, but I think I understand why. Imagine the year is 1965. All computing is on mainframes, and the only high-level languages you've ever seen are Fortran and Algol-60. And then one day you meet APL, and discover:

It has a read-eval-print loop: you can type expressions in and see the results immediately, without running a compiler. It's a remarkably powerful calculator, in the days before calculators were common. (This may account for its popularity in finance.)
It's mostly functional: most operations return a result rather than requiring you to specify a destination to modify, so you can easily combine many operations in one expression.
It has built-in collections — specifically multidimensional arrays, but any decent collection support would do as well. We take collections for granted nowadays, at least in languages higher-level than C, but this wasn't always so. There's a reason many early languages (not just APL and Lisp) were built around a collection type.
It has high-order operations: map is (often) implicit, and reduce, scan, and Cartesian product are single characters.
It's terse, and not just because of its one-character names. You really can say in a line of APL what would take a page of Fortran.

Under these circumstances, wouldn't you be amazed by the powerful new language? Wouldn't you become a faithful user, and for decades wonder why all other languages were so uselessly primitive?

`¤ × σ`	Ist das allgemeinste Strukturzeichen einer Liste. (Struktur der Glieder und Zahl der Glieder offen gelassen).	Is the common structure symbol of a list. (Structure of elements and number of elements left open.)
`¤ × 2σ`	Ist die Struktur einer Paarliste, bei der die Glieder der einzelnen Paare von gleicher Struktur `σ` sind.	Is the structure of a pair-list where the elements of each pair are of the same structure `σ`.
`¤ × (σ, τ)`	Ist die Struktur einer Paarliste bei der die Vorderglieder die Struktur `σ`, und die Hinterglieder die Struktur `τ` haben.	Is the structure of a pair-list where the front elements have the structure `σ` and the back elements have the structure `τ`.
`2 × n × σ`	Ist keine Paarliste, sondern ein Paar von Listen.	Is not a pair-list, but a pair of lists.