Many happy returns

Some languages can return multiple values from a function, just as they can pass multiple arguments to a function. I used to think this was an important language feature. Functions that return more than one result are common, after all, so every language has a way to express them — by returning structures, or side-effecting reference parameters, or CPS — or, instead of these workarounds, by supporting multiple return values directly. It complicates the language kernel a little, but it makes code cleaner, right?

No, it doesn't. There are several reasons for a function to return multiple values. None of them require direct language support, and for most of them, it's not even helpful.

Secondary values

Some functions return one useful value, plus some secondary values that aren't interesting to most callers. For instance, Common Lisp's two-argument floor returns a quotient and a remainder, but usually you just want the quotient. This is where built-in multiple values shine: they can automatically ignore the extra value, so the common case Just Works, with no effort from either the caller or the multivalued function.

CL-USER> (format nil "π is about ~S" (floor 355 113))
"π is about 3"
CL-USER> (floor 355 113)
3
16

Returning and destructuring a tuple doesn't handle secondary values well. When the caller wants all the values, it's fine (and equivalent to built-in multiple values), but in the more common case where the caller wants only the primary value, it forces them to explicitly extract one component. This can often be done with a convenience function like left (l, r) = l, but it still adds noise.

If your language supports overloading on return type, you can make the function return either the primary value or the structure — whichever the caller wants. This is a rare and difficult feature, though.

Returning secondary values by side effect works surprisingly well. If the caller doesn't want the secondary values, it's no trouble, since the arguments to which to write them can be optional. When the caller does want them, it requires binding a variable, which typically forces the expression into a block, which is occasionally a problem for functional code. It's not very verbose, though:

int floor(int a, int b, int *remainder = NULL);

int remainder;
int quotient = floor(a, b, &remainder);

Now for my favorite: continuation-passing style. It has a bad reputation because it's associated with total CPS transformation, in which every continuation becomes an explicit lambda, which is onerous and unreadable. That is indeed bad. If you pass explicit continuations only where needed, however, it's no worse than other uses of λ.

CPS provides a way to handle secondary values at least as well as built-in multiple values do, but without language support. With no explicit continuation, the function returns the primary value, but if a continuation is provided, it receives all the values:

imaginary-lisp> (floor 355 113)
3
imaginary-lisp> (floor 355 113 (λ (quot rem) rem))
16

Success and failure

Often a secondary value encodes success or failure, as in Common Lisp's gethash, read, or macroexpand-1, or any of the many Go functions that return an error as their second value. This means callers who don't care about errors can simply ignore the extra value, while those who do can still get it.

(multiple-value-bind (val present?) (gethash vars varname)
  (if present? val (error "Unbound variable: ~S" varname)))

file, err := os.Open("/some/path")
if err != nil {
    panic(err)
}

Returning a structure + pattern matching handles this cleanly and safely: you simply return a different structure for each continuation. Usually this is something like Haskell's Maybe or Either:

case (Data.Map.Strict.lookup vars varname) of
  Nothing → error "Unbound variable: ~S" varname
  Just val → val

case foo of
  Left err → error err
  Right x → x

This can be a little verbose, but the verbosity can sometimes be eliminated by operators that automatically propagate the failure case, such as Haskell's various monad operators.

CPS has an even cleaner way to handle this: success and failure are two different continuations, so the function can simply take an optional continuation for handling errors, or two explicit continuations (of which the success continuation is often the identity function):

(slurp filename
  (fn (err) (error "Unable to open ~S: ~S" filename err)))

(gethash vars varname i (fn () (error "Unbound variable")))

This multiple-continuation style is often used in Smalltalk, where it's particularly convenient because of Smalltalk's terse lambdas. Toward Leakage Containment (Julia Lawall and Dan Friedman, 1992) recommended it for Scheme, but got little attention, perhaps because it used the nonsensical name “continuation-constructing style” (and didn't mention it in the title). I call it multi-CPS and find it very convenient — often more so than catching exceptions.

Complex returns

Some functions really do have multiple values to return — sometimes lots of them, like Common Lisp's get-setf-expansion, with five values, or get-decoded-time, with nine. These functions tend to be awkward to use regardless of how you receive the return values, but the least awkward way is to return a structure, because then at least you're not forced to name each result individually. That's why most languages do this for times:

time_t now = time(NULL);
tm *decoded = localtime(&now);
printf("The year is %d.\n", tm.tm_year + 1900);

This is less painful than Common Lisp's multiple-return-value approach, which often forces you to bind more values than you care about:

(multiple-value-bind (sec min hour day month year)
                     (get-decoded-time)
  (format t "The year is ~S.~%" year))

setf expanders are a similarly complex result that ought to be a structure.

(multiple-value-bind (temps forms svars writeform readform)
                     (get-setf-expansion x e)
  ...)

Simpler cases, like partitioning a collection, are adequately handled by tuples (if you have destructuring) or CPS (if you don't).

Why so much trouble?

Expression languages get much of their elegance by encoding dataflow in program structure. Each expression has one parent, and can therefore easily specify what to do with one return value. When there is more than one, there's not enough room in the structure to express what to do with each value, so we have to specify it in some more other way.

I still think multiple return values are important, but I no longer think they require special language support. There are plenty of good alternatives, and improving a language's support for those alternatives (by e.g. optimizing destructuring or λ) is easier than complicating the core semantics, and more likely to be useful for other purposes.

5 comments:

  1. Why the trouble? Because multiple values can be more easily implemented more efficiently than most of the alternatives. (They can be stack-allocated without any restrictions, and moving things from heap to stack allocation is an important optimization.)

    ReplyDelete
    Replies
    1. Yeah, optimization might justify them (although some ML compilers do this by optimizing out tuples that are immediately destructured, so they're not the only way). But I used to think expressiveness alone required built-in multiple values, and I no longer do.

      Delete
  2. Indeed, and in Scheme at least it is an error to receive more or fewer values than the continuation expects (except in the case where the continuation expects none, when any number may be returned). This means they can be allocated on the stack, or passed to a continuation as in Chicken, or even consed up into a list with a marker element, as Chibi does (where space and simplicity count for more than speed).

    ReplyDelete
    Replies
    1. It also means they don't automatically ignore secondary values, which makes them much less useful. They're not effortless like Common Lisp's.

      They also used to be verbose, but let-values has fixed that.

      Delete
  3. Common Lisp's approach is nice because of its flexibility. The following example of yours is better written with MULTIPLE-VALUE-LIST:

    (multiple-value-bind (sec min hour day month year)
    (get-decoded-time)
    (format t "The year is ~S.~%" year))

    (format t "The year is ~S.~%" (sixth (multiple-value-list (get-decoded-time))))

    A list is a suitable structure for collecting multiple return values.

    One of the prominent advantages of multiple return values baked into the language in a way that is difficult or impossible to add after-the-fact is the choice of using it alongside the easier-to-add approaches.

    P.S. It would be nice to comment without training Google image-recognition software.

    ReplyDelete

It's OK to comment on old posts.