“Dynamic programming” obscures a simple concept

I'm surprised at how many commenters on my previous post (here and at Reddit and HN) think it's a mistake to identify dynamic programming with memoization. I thought this was uncontroversial: that the central idea of DP is simply to make naive recursive algorithms efficient by memoizing. Evidently a lot of readers think otherwise.

I think they've been misled by the traditional presentation of DP, which bundles a lot of optimizations with that central idea. The recipe usually goes something like this:

  1. Generate a naive recursive algorithm.
  2. Memoize it by saving partial results in an array.
  3. Invert control, so the table of results is explicity assembled bottom-up instead of as a byproduct of top-down recursion.
  4. Analyze the dependencies among the partial results, and if possible, simplify the saved results to avoid storing unnecessary data.

Notice something wrong here? All the action happens in the first two steps. Well, semantically all the action happens in the first step; even the memoization is only an optimization. But what an optimization! It's what makes impractical algorithms practical; the remaining steps don't change much. The inversion of control usually contributes nothing, so I'm puzzled that some commenters describe it as the distinctive feature of DP; it looks like an artifact of the imperative past to me. Simplifying the results is valuable when it works (it can change the space complexity), but it only applies to some problems; usually it yields little or nothing. Memoization is the important part; the rest is optional.

Many programmers learn memoization only as a part of this recipe, and therefore call it by the same name. This is why, when people speak of “a dynamic programming algorithm”, they often mean a memoized one — the process has given its name to its main technique. This isn't very confusing; I wouldn't deprecate this common usage on this account. But I do think it's unfortunate that many programmers know the simple concept of memoization only as part of a larger, less coherent bundle.

What if the concepts were taught in a different order? Imagine you had learned memoization of recursive algorithms as an independent concept, before you had ever heard of dynamic programming. When you later meet the rest of the dynamic programming package — the iterativizing, the compact representations of partial results, the analysis of dependencies — what would you think of it? Would you see it as forming a single coherent idea, or as a collection of optimizations to memoized recursion — potentially useful, sure, but not intrinsic to the concept?

It would be the latter, of course: that's how you understand other complex groups of concepts. Take garbage collection, for example. It can fairly be described as just “finding the live objects by tracing the object graph”. There is much more to it, of course — many algorithms, some of which don't look much like tracing, and many optimizations and implementation tricks. But we don't define garbage collection as necessarily including all of these. We define it by its central concept, and treat the others as optional elaborations on that.

We should treat the concepts of dynamic programming the same way. Rather than try to include them all in one package, and refer to them only as a whole, we should let the central idea stand on its own, without reference to its optimizations. When you make a naive recursive algorithm practical by memoizing it, there's no need to describe the result as a “dynamic programming algorithm”; “memoized” already identifies the technique. Why use the vaguer term when you don't have to?

Some people see DP differently: not as a method of implementing algorithms but as a method of discovering them. To them, it's about analyzing dependencies in recursive algorithms, because this can lead to discovering a simpler or more efficient algorithm. The other concepts, like memoization, are optional. (If I understand correctly, from this point of view memoization is simply the general case of saving results blindly, regardless of how they're used — a useful tool, but not conceptually important.) This is a reasonable concept, but it's not the one I'm talking about; I'm more interested in implementing algorithms than discovering them. Maybe there's a divide here between theorists, who care more about discovery, and practitioners, who care more about handy tools. I'm on the practical side, so I see the algorithm discovery method as less interesting, because it's less often useful, than a simple tool for making naive algorithms efficient.

In an attempt to forestall the most obvious misunderstanding: I do not claim “dynamic programming”, as commonly used, refers only to memoization; it often refers to a recipe which happens to include memoization as its main ingredient. But that main ingredient is much more important than the rest of the recipe. We should give our attention not to the garnishes, but to the simple base on which the rest is optimizations.

Why “dynamic programming”?

I've always hated the term “dynamic programming”. It's big and imposing, making a simple concept sound far more intimidating than it should. And it's completely opaque: if you didn't know the term, would you ever guess that it meant memo functions? (Yes, memoization is the right concept; see the followup.)

I always supposed it was historical, and must have made sense, in some context, at some point. Groping for some way to remember the term, I came up with a fable: if you see the cached results as part of the program, then dynamic programming involves augmenting the program at runtime. So I imagined memoization had initially been seen (or even implemented) as a form of self-modifying code, and that explained the term.

But it turns out it never made even that much sense. Richard Bellman, who coined it, explains:

An interesting question is, ‘Where did the name, dynamic programming, come from?’ The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities.

“Dynamic programming” never meant anything; it was nothing but a political euphemism. But somehow it became attached to one of the useful concepts Bellman discovered, and to this day, students of computer science struggle to learn that simple idea, because it's burdened with a nonsense name.

Please call it “memoization” instead. And tell your students.

(Via Wikipedia, of all places. It's not usually a good source for CS information, but it's pretty good about linking to historical sources, or at least, as in this case, to an article quoting one.)

(“Dynamic” does have some negatively connoted uses nowadays, in particular in the static type theory community — partly from the perennial flamewar rivalry with dynamic typing, and partly because the main practical purpose of static analysis is to eliminate dynamic uncertainty, so “dynamic” suggests failure.)

Several people have quoted Russell and Norvig: “This cannot be strictly true, because his first paper using the term (Bellman, 1952) appeared before Wilson became Secretary of Defense in 1953.” Perhaps the antipathy to research did not originate with Wilson, or perhaps Bellman initially intended the term to make sense, and only later adopted it as camouflage.

Some things never change

Mike Taylor is having a bad initial experience with Scheme. In particular:

As I continue to work my way through the exercises (using Scheme itself at least for now), I run into a problem: for exercise 2.9.3 I need set-car!, which the purity-fascist maintainers of PLT Scheme removed in release 4.0. No doubt there is a secret incantation to undo this change, but I can’t immediately see what it is.

As he has already figured out, this is because he's using an R5RS book with an R6RS-ish implementation, and R6RS DrScheme's dialect is not a superset of R5RS — not even of the commonly taught parts of R5RS. Such conspicuous backward incompatibilities occasionally strike other languages (Perl, certainly, and hasn't it happened to Python at some point?) but it's more embarrassing in a language like Scheme, which is supposed to be old and elegant and stable.

Correction: as Eli Barzilay points out, it's supposed to be an R6RS book. The use of set-c*r! without the necessary (import (rnrs mutable-pairs)) is probably an oversight, made easy by the fact that it still works in most R6RS implementations. So part of the problem here is that the book is buggy: it uses features that aren't in the language it's teaching. Also, DrScheme's main language isn't R6RS but a different dialect which also happens to lack set-car!.

It's also unfortunate in a language which already has a reputation for being impractical. Mike correctly attributed the problem to “purity-fascist maintainers”, but how many other students, on seeing that Scheme's most popular educational implementation doesn't (by default) support the examples used in popular tutorials, have concluded (or confirmed their prejudice) that Lisp is not useful outside of the ivory tower?

(I don't particularly mind immutable lists, but they're a strange thing to break backward compatibility over.)

Flattening list construction

Most beginning programmers expect lists (and other sequences) to behave something like this:

a = [1, 2, 3]
b = [4]
[a, b] => [1, 2, 3, 4]
[a, 5, b, 6] => [1, 2, 3, 5, 4, 6]

They expect the list construction operator to catenate lists, but not non-lists. As long as lists can't contain other lists (which rarely occurs to beginners), this is a well-behaved operator. Languages which don't allow lists of lists often have it as their default list construction operator: Perl, APL, and R, for example.

Languages which do allow lists of lists generally reject flattening construction on the grounds that it's unpredictable. Which is true — it's easy to write list functions using it that will work perfectly until they encounter nested lists, whereupon they quietly, mysteriously flatten them.

The trouble is, it's a very convenient operator. It's common (especially at the REPL) to construct a list ad hoc, out of some preexisting pieces, and when you do, you generally don't care which elements came prepackaged in lists and which were solitary. That's irrelevant; you just want to assemble them all into a list without worrying about where they came from.

For example, recently I was trying to control the undead menace with Clojure, and wrote this:

(def units (concat (for [y (range 1 20 3)]
                     (spawn zombie 1 y))
                   (list (spawn pixie 10 10)
                         (spawn pixie 15 10))))

I nearly forgot to include that list, because it was irrelevant detail. What I really wanted to write was something like this:

(def units (enlist (for [y (range 1 20 3)]
                     (spawn zombie 1 y))
                   (spawn pixie 10 10)
                   (spawn pixie 15 10)))

In a dynamically typed language, it's easy to define enlist (so I won't bother), but most don't, because it can't be made to work consistently for all inputs. I think this is a loss, because it's so very convenient when it does work.