Kernel, library, and primitives

It's often useful to distinguish between parts of a language based on whether they can be expressed in user code:

  • The kernel is the part the compiler has to know about.
  • The library is the part that can be implemented in user code.
  • Primitives are the parts that are provided by the implementation, but could be expressed in user code.

This is an obvious and possibly familiar distinction, but it's underused. It should be part of the standard vocabulary for talking about languages.

Some examples in Scheme:

  • let is library, because it can be implemented as a macro. (It could be implemented as a macro in Common Lisp too, but for historical reasons it's in the kernel.)
  • procedure? is primitive, because while it could be implemented in user code, there's nothing it could be implemented in terms of. In a language with type-of, it's library.
  • The Form Without A Name (the function call operator) is kernel, because there's no way to define it, or anything like it, in user code. Even in Racket, where #%app can be redefined, there is no way for user code to define such a construct.
  • lambda, named-lambda and case-lambda can express each other as macros. At least one of them needs to be predefined, but the choice is arbitrary. Since the compiler needs to know about that one, it's kernel and the other two are library.
  • Full call-with-current-continuation is kernel, not just a primitive, because it can't be implemented without changing how function calls are compiled. Not really; see comments.
  • cons is a primitive in R5RS, because there's no way to define new datatypes. (Defining pairs as tagged vectors doesn't count, even if you redefine all vector operations to reject them, because existing code will still see them as vectors.) In R6RS, it can be library, because it could be defined with define-record-type. (However, it's often still primitive, for efficiency or for historical reasons.)
  • syntax-rules is mostly library, but requires some primitive support. define-syntax is kernel, but in a language with a hook to define new code-transformation passes in user code (kind of like Perl's source filters), it could be library.
  • The distinction between kernel and library has nothing to do with what names are standard or in the default namespace. map is standard and predefined but is pure library; if define-values set! is provided in a separate module, it's still kernel, because it requires compiler support.
  • Garbage collection is kernel, because it requires compiler (and runtime) support. Not really; see comments.

Some things to say using these terms:

  • Library has to be implemented in terms of something, so primitives are necessary. There are often additional primitives for efficiency (especially in slow implementations), but most primitives add functionality though would be otherwise unavailable.
  • When two constructs can express each other, either one could be primitive; the choice is often arbitrary.
  • A foreign function interface can be seen as an endless source of primitives. This is one of the reasons FFIs are so important for new languages: they compensate for a shortage of primitives.
  • Library, unlike the other two parts, is potentially portable between implementations. If a language provides a good reference implementation of its library (as most SRFIs do), this makes it much easier to implement.
  • Complexity in the kernel is much more expensive than in primitives or library, for both implementors (because it complicates the compiler) and users (because they can't understand the kernel according to the same rules as user code). Primitives, however, aren't necessarily more expensive than library.

Expressive features are particularly interesting when they allow the language to express things in library that would otherwise have to be in the kernel. Such features can often pay for their complexity by simplifying the kernel. For example:

  • In most languages, all abstract syntax (and concrete syntax too, for that matter) is kernel. In languages with macros, most abstract syntax can be library, leaving only a few special forms in the kernel. In a fexpr language, even the special forms are mere primitives, because the compiler doesn't have to know anything about them. (It might still know, for efficiency, but it doesn't have to.) Kernel has a very small kernel.
  • Exposing compiler data structures such as environments allows user code to do things otherwise possible only in the kernel. In Scheme, there is no programmatic way to create definitions, so define and define-values are kernel. In Common Lisp, or in a Scheme extended with a define! procedure that creates top-level definitions, top-level define and define-values can be implemented as macros.
  • Optimization is usually kernel, but Common Lisp and GHC can express some optimizations in user code. (This is a powerful and underappreciated capability.)
  • ML's typechecking is kernel, because there's no way for user code to define it. In a language which allows user-defined program analyses (are there any?), it could be library. (This feature would fit nicely with user-defined optimizations.)

Caveats

There are really two orthogonal distinctions here: kernel vs. library is about whether the compiler needs to know about the feature; primitive vs. derived is about how it was implemented. However, derived kernel is usually impossible (what, are you redefining eval?), and the distinction from primitive kernel is not very interesting, so I don't bother.

The distinction between kernel and library is often more important than that between library and primitives. In these cases it's convenient to lump primitives and library together.

Some features require runtime but not compiler support, like fixnums or weak pointers or unboxed numeric arrays. These form a class (“runtime features”?) between primitives and kernel. I didn't include this class in the list above because I think it's less important, and can usually be lumped in with kernel or primitives.

I'm inconsistent about whether “primitive” refers only to features that must be built in, or to all those that are built in to a particular implementation.

3 comments:

  1. This distinction makes more sense for an implementation than for a language. For example, you say `call/cc` is kernel because it changes the way procedure calls are compiled, but that's only on the assumption that JSR/RET is the "right" way to compile code from which call/cc is a deviation. At most we can say that `call/cc` is or is not kernel on a specific implementation: on Chicken the compiler knows nothing about it, because everything is CPS-converted.

    I can't imagine why you say `define-values` is primitive: it can be implemented as a syntax-rules macro that expands into `define` and `call-with-values`. A definition appears in the current R7RS draft.

    ReplyDelete
  2. Oh, call/cc and (precise) garbage collection affect the compiler only in that they constrain implementation strategies in ways that change the most likely choice. So they're really runtime features that happen to affect some compilers; they're not kernel in the sense that any interpreter must know about them.

    As for define-values: I completely overlooked state! Or is there a non-stateful way to implement it? I think there can't be, because any approach that saves the values somewhere before defining the names needs to get rid of the reference to avoid leaking memory, and there's no stateless way to make a temporary across several definitions. But minor language extensions (e.g. a variant of let that doesn't introduce a definition contour) could change this.

    Maybe that's what you meant about the distinction making more sense for implementations than languages. If each implementation implements a slightly different language, their kernel/library/primitive boundaries may be quite different.

    ReplyDelete
    Replies
    1. Well, it's a question. Canonically "if" is primitive in Scheme and "cond" is derived, but it's obvious that it could have been otherwise (and is, in Elisp; CL exposes the primitivity of "if" via the behavior of "macroexpand"). Would an implementation that does so be an implementation of not-Scheme? I don't believe it would, even though "cond" would be kernel and "if" would be library in such an implementation.

      Delete

It's OK to comment on old posts.