# CBMC as proof Think of CBMC as a tool for doing mathematical proof. CBMC does an exhaustive search for issues affecting functional correctness and memory safety. If CBMC finds an issue, [as we have seen](debugging.md), then CBMC has likely found a bug. But if CBMC does *not* find any issues, then you can rest assured that the issues like memory safety that you are asking CBMC to check for will not occur in your program with the inputs you are asking CBMC to check. We note, however, that using CBMC may require bounding the size of the inputs you are asking CBMC to check. It may be necessary, for example, to assume that packets coming off the network are of size at most 1024 bytes in order to fit the problem into CBMC. We refer to assumptions like this as *proof assumptions*. It is important to remember that proof assumptions exists, and to communicate them clearly to the reader or consumer of any proof you write with CBMC. Let's look again at [memory-safety.c](examples/simple/memory-safety.c) ```c #define SIZE 20 char buffer[SIZE]; char read_buffer(int i) { return buffer[i]; } char read_pointer(int i) { return *(buffer + i); } int main() { int index; read_buffer(index); read_pointer(index); } ``` that we found to have [memory safety issues](debugging.md#memory-safety) in the last section. The problem was obviously the lack of bounds checking in the code. Let's fix that. ## Memory safety proof Let's try adding bounds checking. Consider [memory-safety1.c](examples/simple/memory-safety1.c) ```c #define SIZE 20 char buffer[SIZE]; char read_buffer(int i) { if (i < SIZE) return buffer[i]; return '\0'; } char read_pointer(int i) { if (i < SIZE) return *(buffer + i); return '\0'; } int main() { int index; read_buffer(index); read_pointer(index); } ``` We probably want better error handling than just returning `NULL`, but let's run CBMC to prove memory safety: ```bash cbmc memory-safety1.c --bounds-check --pointer-check ``` ``` ** Results: memory-safety1.c function read_buffer [read_buffer.array_bounds.1] line 10 array 'buffer' lower bound in buffer[(signed long int)i]: FAILURE [read_buffer.array_bounds.2] line 10 array 'buffer' upper bound in buffer[(signed long int)i]: SUCCESS memory-safety1.c function read_pointer [read_pointer.pointer_dereference.1] line 14 dereference failure: pointer outside object bounds in buffer[(signed long int)i]: FAILURE ** 2 of 3 failed (2 iterations) VERIFICATION FAILED ``` Oh, nuts. We failed. We added bounds checking only for the upper bound. ## Memory safety proof 2.0 Let's try adding bounds checking for both the upper and lower bounds. Consider [memory-safety2.c](examples/simple/memory-safety2.c) ```c #define SIZE 20 char buffer[SIZE]; char read_buffer(int i) { if (0 <= i && i < SIZE) return buffer[i]; return '\0'; } char read_pointer(int i) { if (0 <= i && i < SIZE) return *(buffer + i); return '\0'; } int main() { int index; read_buffer(index); read_pointer(index); } ``` and run CBMC ```bash cbmc memory-safety2.c --bounds-check --pointer-check ``` ``` ** 0 of 3 failed (1 iterations) VERIFICATION SUCCESSFUL ``` and declare victory. ## Memory safety proof harness Let's pivot back to the notion of a proof harness before we declare complete victory. Back when we introduced [CBMC as a form of unit testing](unit-testing.md), we had a function [quartile](quartile.c) under test and a [unit test](unit-test.c) for `quartile`. The unit test set up the software environment for the function under test and invoked the function. The function `quartile` was so simple that all the unit test had to do was initialize the argument `x` passed to `quartile` and invoke `quartile` on `x`. Then we removed the initialization of `x` so that CBMC could consider all possible integer values for the argument `x`. We transformed the unit test into what we call a *proof harness* for the function `quartile` and used that with CBMC. We routinely find ourselves with a library and wanting to use CBMC to prove that all of the public entry points in the library's API are memory safe. Our style is to write a memory safety proof for each entry point independently. We write one proof harness for each entry point. The purpose of the proof harness is to model the software environment in which the entry point will be called. The proof harness will always have to model the inputs to the entry point. But the proof harness might also have to model some aspects of the global state if, for example, there is a system table accessed by the entry point. In the ideal world, a proof harness will construct arbitrary, unconstrained values for every variable and data structure in the software environment of an entry point, and invoke the entry point. In the real world, the entry point makes some assumptions about its inputs. It may, at the very least, assume that the system table is well-formed in the sense that it satisfies some data structure invariant. In the ideal world, the proof author will talk with the code author and convince the developer to write more bullet-proof code. In the real world, the proof author will model these assumptions in the proof harness to prove memory safety. But now these assumptions become part of the proof: CBMC has demonstrated that the entry point is memory safe, but only when the assumptions described in the proof harness are true when the entry point is invoked. A developer might want to include in the testing framework some checks that these assumptions really are true at all call sites calling the entry point. Or a developer might want to use CBMC to prove this... But let's return to our memory safety example and transform it into something you are more likely to encounter in the real world. ### The library First, the buffer and the accessor functions are likely to be part of a library. Here is [library.h](examples/simple/library.h) ```c char read_buffer(int i); char read_pointer(int i); ``` and [library.c](examples/simple/library.c) ```c #include "library.h" #define SIZE 20 char buffer[SIZE]; char read_buffer(int i) { if (0 <= i && i < SIZE) return buffer[i]; return '\0'; } char read_pointer(int i) { if (0 <= i && i < SIZE) return *(buffer + i); return '\0'; } ``` ### The proof harnesses Second, we write one proof harness for each entry point. Here is [read_buffer_harness.c](examples/simple/read_buffer_harness.c) ```c #include "library.h" harness() { int index; read_buffer(index); } ``` and [read_pointer_harness.c](examples/simple/read_pointer_harness.c) ```c #include "library.h" harness() { int index; read_pointer(index); } ``` Notice that we have changed the function name from `main` to `harness` in these proof harnesses. This is not essential, it is just a matter of style, but it does avoid name conflicts with a real `main` that might exist in the real environment. ### The proof Now we can run the proofs just as before, giving both the library and the proof harness on the command line, and also indicating to CBMC that the function to test is now called `harness` and not `main`: ```bash cbmc library.c read_buffer_harness.c --bounds-check --pointer-check --function harness cbmc library.c read_pointer_harness.c --bounds-check --pointer-check --function harness ``` In both cases: ``` ** 0 of 3 failed (1 iterations) VERIFICATION SUCCESSFUL ``` Success!! ### Building code for proof One last thing. In the real world, the library is not this simple. There is probably a build process to compile the source code into a library using a compiler like `gcc`. One approach is to build the library using `goto-cc` in place of `gcc`. The compiler `goto-cc` is intended to be a drop-in replacement for `gcc`. Using `goto-cc` should be as easy as building with `CC=goto-cc` in place of `CC=gcc`. The difference is that `goto-cc` produces an intermediate representation for the program called a "goto program" or "goto binary" that is used by CBMC for model checking. When we run `cbmc` directly on a source file, CBMC is invoking `goto-cc` on the source file before model checking, but we can run `goto-cc` ourselves to build the goto program, and run CBMC on that. Our approach, however, is not to build the entire library. There are performance and complexity issues that we won't go into here related to symbolic execution and constraint solving. We usually build only the portions of the library that are required to prove that a particular entry point is memory safe. Using our simple library above to illustrate, we would build and prove memory safe the entry point `read_buffer` as follows: ```bash goto-cc library.c -o library.goto goto-cc read_buffer_harness.c -o read_buffer_harness.goto goto-cc library.goto read_buffer_harness.goto -o read_buffer.goto --function harness cbmc read_buffer.goto --bounds-check --pointer-check ``` The first two lines build the source files, the third links them together (and identifies `harness` as the entry point), and the fourth line runs CBMC on the result. The process for `read_pointer` is similar. Now, finally, and forever: ``` ** 0 of 3 failed (1 iterations) VERIFICATION SUCCESSFUL ``` Success!!