Arcane Sentiment: February 2010

Tim Jasko wrote (over a year ago; sorry):

This got me thinking about how other languages tend to lend themselves to certain vulnerabilties. We're all aware of buffer overflow exploits; these are due almost entirely to the fact that C/C++ force you to waste time managing memory.

Most buffer overflows are due to C, but this isn't because of memory management, except in the very general sense that C doesn't allocate string space automatically. C's vulnerability to buffer overflows is almost entirely due to a peculiar fault of its standard library: many of its string functions do not operate on its standard string type.

C's standard mutable string type is a null-terminated fixed-size buffer. This isn't a great choice, for well-known reasons, but it isn't a vulnerability in itself, and it is at least easy to implement. But surprisingly, C's standard library doesn't. It has a variety of operations that write to string buffers, but many of them expect infinite buffers, so they're unsafe on ordinary C strings. These include strcpy, strcat, sprintf, *scanf with %s, and gets. Sadly, these functions have short names and are considered canonical, so they're used in far more than the rare cases where they're safe.

Furthermore, of the operations that do operate on null-terminated fixed-size buffers, most break the invariants of the type in error cases. In particular, strncpy, strncat, snprintf, etc. omit the null terminator on overflow, leaving an invalid string in the buffer. The callers of these functions are supposed to check their return values to discover whether the result fits in the buffer (and if not, give up or try again with a bigger one), but as usual, most callers don't check, and in the overflow case they receive an invalid string instead of a merely truncated one. So even switching to the length-aware string operations doesn't eliminate buffer overflow problems. (This may contribute to C programmers' tendency to use the unsafe ones.)

There are only a very few C functions that correctly write strings to buffers, preserving their invariants even on failure. In fact, the only one I can think of is fgets.

As Dan Bernstein put it:

I've mostly given up on the standard C library. Many of its facilities, particularly stdio, seem designed to encourage bugs.

Fortunately this problem is unique to C's standard library. There's no reason other languages at the same level of abstraction, or even other C libraries, need to have unsafe string operations. It's easy to write correct string operations, even in C, even for null-terminated strings in fixed-size buffers — and security-conscious C programmers often do. There's no reason but history for any language, even C, to have language vulnerabilities.

Autoconversion of types is a handy thing, and easy to miss when you don't have it. This is especially so in languages like C++ that have many static type distinctions, but don't do many of the conversions automatically. C++ has a built-in autoconversion mechanism, but it's too dangerous to use much, and the alternative of manually overloading every operation with type-converting wrappers is too tedious for anything but a few basic types, so in practice C++ is a non-autoconverting language.

That's the standard excuse, anyway. C++ actually does provide quite a lot of autoconversions, but some of them are bizarrely unhelpful. Take the most obvious autoconversion, for example: converting integers to strings. I ran into this one today:

string s;
int n = 40;
s = n;

A naïve (or even not-so-naïve) user might expect this to set s to "40", but it's actually "(". Assigning a single integer to a string is treated as assignment of a single character, not the decimal representation of the integer. But this is only for assignment; the constructor doesn't accept integers at all. I can think of some excuses for this (characters are sometimes (e.g. by getc) represented as ints; string should be consistent with other collections; base 10 is arbitrary; conversions between semantically different types shouldn't happen automatically; STL is supposed to provide a solid foundation, not be easy to use), but user expectation is so strong here that I'm still surprised.

I'm also surprised I've never run into this before. I habitually use printf-like functions for generating strings, but not always, so you'd think I'd encounter this more than once a decade, especially since += has the same surprising overloading. Do I really never generate strings from single integers? Is this most obvious of autoconversions really so rare in practice?

Arcane Sentiment

C's safety problem

A surprising autoconversion