C's safety problem

Tim Jasko wrote (over a year ago; sorry):

This got me thinking about how other languages tend to lend themselves to certain vulnerabilties. We're all aware of buffer overflow exploits; these are due almost entirely to the fact that C/C++ force you to waste time managing memory.

Most buffer overflows are due to C, but this isn't because of memory management, except in the very general sense that C doesn't allocate string space automatically. C's vulnerability to buffer overflows is almost entirely due to a peculiar fault of its standard library: many of its string functions do not operate on its standard string type.

C's standard mutable string type is a null-terminated fixed-size buffer. This isn't a great choice, for well-known reasons, but it isn't a vulnerability in itself, and it is at least easy to implement. But surprisingly, C's standard library doesn't. It has a variety of operations that write to string buffers, but many of them expect infinite buffers, so they're unsafe on ordinary C strings. These include strcpy, strcat, sprintf, *scanf with %s, and gets. Sadly, these functions have short names and are considered canonical, so they're used in far more than the rare cases where they're safe.

Furthermore, of the operations that do operate on null-terminated fixed-size buffers, most break the invariants of the type in error cases. In particular, strncpy, strncat, snprintf, etc. omit the null terminator on overflow, leaving an invalid string in the buffer. The callers of these functions are supposed to check their return values to discover whether the result fits in the buffer (and if not, give up or try again with a bigger one), but as usual, most callers don't check, and in the overflow case they receive an invalid string instead of a merely truncated one. So even switching to the length-aware string operations doesn't eliminate buffer overflow problems. (This may contribute to C programmers' tendency to use the unsafe ones.)

There are only a very few C functions that correctly write strings to buffers, preserving their invariants even on failure. In fact, the only one I can think of is fgets.

As Dan Bernstein put it:

I've mostly given up on the standard C library. Many of its facilities, particularly stdio, seem designed to encourage bugs.

Fortunately this problem is unique to C's standard library. There's no reason other languages at the same level of abstraction, or even other C libraries, need to have unsafe string operations. It's easy to write correct string operations, even in C, even for null-terminated strings in fixed-size buffers — and security-conscious C programmers often do. There's no reason but history for any language, even C, to have language vulnerabilities.


  1. I don't agree that it is easy, for certain definitions of easy, to write correct code in C for manipulating null-terminated strings in fixed sized buffers, for a simple reason: if you can't let the fixed length follow the string, either via static typing or dynamically, then normal procedural abstraction will tend to hide the length and lead to errors later.

  2. "these functions have short names and are considered canonical"

    Interesting. Somehow it had never occurred to me before that short names might be more likely to be considered canonical. But it rings true.


It's OK to comment on old posts.