When a language spec leaves the behavior of some operation unspecified, there are several things an implementation can do:
- Signal an error in the usual way (whatever that is).
- Extend the language by defining a useful meaning.
- Crash, i.e. report an unrecoverable error.
- Return an arbitrary value.
- Break safety by e.g. corrupting memory.
- Choose behavior unpredictably. Some C compilers now do this, to the horror of their users.
Traditionally, when a spec leaves some behavior unspecified, it's completely unspecified, with no constraints at all on what implementations can do. This maximizes implementor freedom, but minimizes the amount of behaviour users can rely on. This sometimes forces them into contortions to stay within the specified language, or leads them to write nonportable code without realizing it. Even worse, implementors sometimes take lack of specification as a license for arbitrarily perverse behaviour.
A spec can reduce these problems by leaving behavior only partially unspecified. Here are some options, in roughly increasing order of unspecifiedness:
- Signals an error
- The meaning of this operation is undefined — so undefined that implementations must detect it and report it. This provides maximum safety for users, but no freedom for implementors. (This isn't actually unspecified behaviour, but it's pragmatically similar.)
- Signals an error unless extended
- Implementations must detect the undefined behavior, but they have the option of giving it some useful definition instead of signaling an error. For example, in a language without complex numbers,
(sqrt -2)might be specified to signal an error, but an implementation that does have complex numbers could make it return one. In Scheme,
(map - (vector 1 2 3))might be specified to signal an error (because the vector is not a list) unless
mapis extended to work on other sequence types. This lets implementors extend where they want to while preserving safety everywhere else, so it's a good default for languages that aim to be safe.
- Unspecified value
- The operation will return normally and safely, but the result is unspecified, often with constraints such as a type. For example, C's
INT_MAXis an unspecified integer at least 32767. In Scheme, the result of
(exact? (/ 1 2))is unspecified but must be a boolean.
- Unspecified but safe
- The language's basic safety guarantees continue to apply, but behavior is otherwise unspecified. For example, the result of arithmetic overflow in many languages is unspecified — it might signal an error, it might overflow into bignums or flonums or
+Inf, it might be modulo some constant, or it might return
nilor nonsense — but it won't corrupt memory or crash.
- Unspecified but implementationally unsurprising
- The behaviour is not specified, but it should make sense in terms of some underlying model. For example, many languages do not specify what sort of pathnames their file operations accept, except that they should be those of the host system. C does not specify that the result of falling off the end of an array or dereferencing
NULLis to blindly attempt to access that address, but that's what users expect.
- Unspecified and unsafe
- The language's usual safety guarantees no longer apply. Anything might happen, including crashes or corruption. In particular:
- Unspecified but consistent
- The implementation may choose whatever semantics it likes, but it must preserve those semantics when optimizing. It may not assume the operation won't happen, or choose semantics unpredictably.
- Unspecified and unpredictable
- Behavior is completely unspecified, and the compiler may do whatever it likes, even if it's inconsistent and doesn't make sense in terms of the underlying implementation. Avoid this! As John Regehr puts it, “A compiler that is very smart at recognizing and silently destroying [code with unspecified behavior] becomes effectively evil, from the developer’s point of view.”
These options are combinations of simpler constraints on behavior: safety; normal return vs. signaling an error; predictability; consistency with the underlying implementation. What other constraints, or combinations thereof, are useful?
Update 15 December: See also John Regehr's When is Undefined Behavior OK?