A sound bug finder is an unsound correctness prover

This account of Coverity, a commercial bug-finding tool for C and C++, illustrates a peculiar attitude common in the program-checking field:

we were also unsound. Our product did not verify the absence of errors but rather tried to find as many of them as possible. Unsoundness let us focus on handling the easiest cases first, scaling up as it proved useful. We could ignore code constructs that led to high rates of false-error messages (false positives) or analysis complexity, in the extreme skipping problematic code entirely (such as assembly statements, functions, or even entire files). Circa 2000, unsoundness was controversial in the research community, though it has since become almost a de facto tool bias for commercial products and many research projects.

Most program checkers prove theorems about programs. In particular, most aim to prove programs correct in some respect (e.g. type safety). A theorem prover is sound iff all the theorems it proves are true. So a correctness-prover that claims a buggy program is correct is unsound, but one that rejects a correct program is not. People in the program-checking field are accustomed to this, so they habitually think soundness = proving the absence of bugs.

But a bug-finder doesn't aim to prove correctness. Instead, it aims to prove incorrectness: to prove the presence of bugs. It's sound iff all the bugs it reports are real bugs — that is, if it has no false positives. False negatives (overlooking bugs) are OK, because they don't make its claims incorrect.

Unfortunately, most interesting properties are undecidable, so a checker can't be sound at both bug-finding and correctness-proving, unless its claims are very weak.

So Coverity did the right thing, in theory as well as practice, when they focused on suppressing false positives. Their bug finder was unsound, but it was unsound because it reported spurious errors, not because it missed some real bugs.

Addendum: bug finders in languages

The most visible bug finders (especially in academia) are those, like the ML typechecker, that try to prove something about the program, and report a bug if they fail. These are unsound as bug finders, since they sometimes report nonexistent bugs. Unfortunately, bug finding is their main use, so their standard of soundness does not fit.

This is particularly problematic for checkers that are built in to a compiler, and don't just complain but prevent programs from running. (This is part of why they're so visible — if the checker makes mistakes you can't ignore, you have to be aware of it.) It's hard (especially in theory) to justify a compiler that rejects correct programs. Sound bugfinders don't have this problem.

No comments:

Post a Comment

It's OK to comment on old posts.