Efficiency above all else

I hack C++ at work. When it comes to collections, this means using the Standard Template Library, which can be frustrating, because STL was designed with efficiency in mind, and convenience decidedly not in mind. For instance, many of STL's collection operations are defined not on collections but on pairs of iterators, which are to arbitrary collections as pointers are to arrays. That's so STL can handle arrays in the familiar, efficient way, but the extra abstraction makes them even more inconvenient than pointers. Iterator code is full of things like if (find(collection.begin(), collection.end(), x) != collection.end()) ..., but there are no versions that simply take collections instead, even though they're easy to define and they make for much clearer code. That's just not considered important in C++ culture.

Sometimes the obsession with efficiency makes for some strange omissions. STL includes a stack class, but it doesn't support pop. It has a pop method, but that just removes the top element; it doesn't actually return it, which is almost always what you want. What's going on? The STL documentation explains:

One might wonder why pop() returns void, instead of value_type. That is, why must one use top() and pop() to examine and remove the top element, instead of combining the two in a single member function? In fact, there is a good reason for this design. If pop() returned the top element, it would have to return by value rather than by reference: return by reference would create a dangling pointer. Return by value, however, is inefficient: it involves at least one redundant copy constructor call. Since it is impossible for pop() to return a value in such a way as to be both efficient and correct, it is more sensible for it to return no value at all and to require clients to use top() to inspect the value at the top of the stack.

This is a peculiar inversion: stack doesn't have the most useful form of pop, not because the language is too weak to express it, but because it's too powerful. Since C++ has types with copy constructors, which might be inefficient to return from pop, nothing can be returned - not even the primitive types which can be returned efficiently, and which (especially pointers) are most common in collections anyway.

OK, stack really ought to just support both operations. Just as all the functions accepting pairs of iterators should be overloaded to accept collections as well. But these are matters of convenience, not efficiency, so in C++ they don't matter.

15 comments:

  1. The other reason for pop/top being the way they are is for exception safety[1].

    With the current design, both functions have the strong exception safety guarantee. If pop() returned an element, then it could only support the basic guarantee.

    Many years ago, before exception safety was well understood, Tom Cargill once posted a challenge [2] to design an implementation of his stack class that was exception safe. Turns out that it's pretty much impossible with the interface he specified; you need one such as that in the STL (for value-based containers).

    [1] http://www.boost.org/community/exception_safety.html
    [2] http://www.informit.com/content/images/020163371x/supplements/Exception_Handling_Article.html

    ReplyDelete
  2. I'd say it is too weak for this. In D, where you have garbage collector, such problems don't exist - functions can return pointers without risk.

    ReplyDelete
  3. I don't understaind why each time when is a talk about C++ someone have to mention D. Guys, GC is not world saver; there are other things languages should have, thought.

    > For instance, many of STL's collection operations are defined not on collections but on pairs of iterators

    Yes, since that is the only way to abstract access to various containers. AFAIK, Stepanov (et. all) spent a years researching a way to unify access to various non-similar containers; interator pattern was (and is) still the only way for that.

    Opposite, each collection would have different implementation of the same operations, which introduce a lot of duplications. The best example of this is a string class.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. The standard library is merely a set of very basic tools meant to solve very general problems.
    If you have a specific situation (e.g. you need a "pop" member function that pops the element and returns it, and you don't care about the exception-safety problems that generates), it's pretty trivial to do it.

    Anyway, you don't usually use "naked" containers, but create you own containers, that do something meaningful (to you) with your data, and use SL's containers to do the hard work.


    Also, note that there are lots of different languages, and C++ is not the easiest or the more elegant. So, why would you choose C++? Because you need efficiency -- you need a program that's as fast as if it was written in C, without the hassle of managing the memory yourself.

    ReplyDelete
  6. If you want versions of the standard library algorithms such as find that take containers instead of iterators visit:
    stlab.adobe.com

    ReplyDelete
  7. You "hack" C++ at work? Yeah, okay.

    ReplyDelete
  8. To anonymous,

    "Hack" in this context is generally used to describe working with code in a role that doesn't make you a core or sole developer of a new application. It can be contributing patches, fixing bugs, writing plug-ins, maintaining old code, or thrashing up prototypes. In the FOSS community it can also mean projects you are working on before they break out into wider community project. Hacking is an informal, fun, looser form of development.

    ReplyDelete
  9. > Yes, since that is the only way to abstract access to various containers. AFAIK, Stepanov (et. all) spent a years researching a way to unify access to various non-similar containers; interator pattern was (and is) still the only way for that.

    John: Oleg Kiselyov has done some interesting stuff on converting enumerators to cursors and vice versa: http://okmij.org/ftp/Haskell/#fold-stream

    ReplyDelete
  10. "Since C++ has types with copy constructors, which might be inefficient to return from pop, nothing can be returned"

    The inefficiency has nothing to do with the fact that C++ has types with copy constructors. It has to do with the fact that a copy must be performed.

    Your criticisms of the iterator pattern is also unfounded.

    If you want a collection that is more convenient to use, then 1) don't use an STL data structure, 2) wrap an STL data structure and add whatever you want, or 3) use a different language. There are some very convenient languages out there for less performance-critical apps.

    ReplyDelete
  11. Exception safety, like efficiency, is only a problem for types with nontrivial {con,de}structors. Primitive types (which is what were in the stacks that prompted this post) don't need it, but they still pay the price.

    google: The standard library is merely a set of very basic tools meant to solve very general problems.

    Yeah, that's what I'm complaining about. STL provides the foundation of a collection library, but doesn't put a useful interface on it. The stack is particularly disappointing: it's a convenience wrapper around another class, to make it clearer that only stack operations are being used, but it's missing one of the most basic of those operations. C++ is halfway to having a useful collection library, but only halfway.

    This is a shame because it seriously hurts the language's utility. C++ could easily have a collection library that doesn't require messing with iterators; it could have stacks that work correctly for the primitive types and still support the efficient, exception-safe operations for other types - but these things aren't considered important. Several people have said things along the lines of "use a different language", and they're right, but it doesn't have to be that way. C++ may never be a great high-level language, but it could be a lot better than it is.

    ReplyDelete
  12. Anonymous: The inefficiency has nothing to do with the fact that C++ has types with copy constructors. It has to do with the fact that a copy must be performed.

    For small types without copy constructors, copying is cheap, and for those that fit in a register, it's one instruction. In languages where every value is like that, no one worries about the cost of returning the popped value. But in the presence of copy constructors, returning the value can be arbitrarily expensive. Of course a constructorless type can be large enough to be expensive, but those are much rarer than types with expensive constructors.

    ReplyDelete
  13. John: I don't understand why each time when is a talk about C++ someone have to mention D.

    Probably because D exists to answer this sort of complaint. It gives up a little of C++'s low-level control to gain a lot in convenience. It's a good trade for many applications, and I suspect it would get a lot more attention if it weren't for uncertainty about the tools - there is now a free compiler, but I didn't know that until I looked it up, and I don't know how well it works in practice.

    AFAIK, Stepanov (et. all) spent a years researching a way to unify access to various non-similar containers; interator pattern was (and is) still the only way for that.

    I don't object to the existence of iterators in C++, only to the necessity of using them everywhere. Iterators are for expressing portions of collections (and for iterating through them when you don't have λ, of course), but most collection operations are on whole collections, and STL is missing those for no obvious reason.

    ReplyDelete
  14. "stlab.adobe.com"

    I looked, but the 'Adam and eve architecture' was just a bit too cutesy for me.

    ReplyDelete
  15. All talk of exception safety is folly when undefined behavior is the true elephant in the room.

    ReplyDelete

It's OK to comment on old posts.