# Proof harnesses This section is about writing proof harnesses for CBMC. Our [proof harness overview](proof.md#memory-safety-proof-harness) has already introduced the notion of a proof harness, and this section goes into greater detail about the issues to consider when writing a harness. The purpose of a proof harness is to build a model of the environment of the function under test. The purpose of this model is to collect in one place all of the assumptions required for the proof to hold. The proof harness is responsible for: * [Modeling input](#modeling-input) * [Modeling global state](#modeling-global-state) * [Modeling interfaces](#modeling-interfaces) * [Allocating data](#allocating-data) * [Validating data](#validating-data) * [Proof assumptions](#proof-assumptions) ## Modeling input For simple functions, the model of the environment is just the input to the function. Suppose we have a library [library1.c](examples/harness/library1.c) ```c #include int alpha1(int a) { return a+1; } int beta1(int b) { int bb = alpha1(b); assert(bb == b); } ``` and we want to test function `beta1`. The input to function `beta1` is an integer `b`. We write a proof harness [harness1.c](examples/harness/harness1.c) ```c int beta1(int b); int main() { int x; beta1(x); } ``` that allocates an unconstrained integer named `x` and invokes `beta1` on `x`. We run CBMC on the result with ```bash goto-cc -o library1.goto library1.c goto-cc -o harness1.goto harness1.c goto-cc -o proof1.goto library1.goto harness1.goto cbmc proof1.goto ``` and we see a violation of the assertion `bb == b`. ```bash ** Results: library1.c function beta1 [beta1.assertion.1] line 9 assertion bb == b: FAILURE ** 1 of 1 failed (2 iterations) VERIFICATION FAILED ``` Correcting this assertion to `bb == b + 1` fixes the problem. ## Modeling global state Sometimes a function depends on global state. A common example is low-level system code that depends on a system table, and we want to know the code works no matter what has been stored in the system table. Suppose we have a library [library2.c](examples/harness/library2.c) ```c #include int counter = 0; int alpha2(int a) { counter = counter + 1; return a+counter; } int beta2(int b) { int bb = alpha2(b); assert(bb == b + 1); } ``` and we want to test function `beta2`. This time the function `beta2` depends on its input `b`, but it also depends on a global variable `counter` that is initialized to `0` but is changed with each invocation of `beta2`. We want to know that `beta2` always works, no matter how many times `beta2` has been invoked in the past. We do this by writing a proof harness that chooses an unconstrained value for both the function input `b` and the global variable `counter`. We write a proof harness [harness2.c](examples/harness/harness2.c) ```c extern int counter; int beta2(int b); int main() { int cnt; counter = cnt; int x; beta2(x); } ``` We run CBMC on the result with ```bash goto-cc -o library2.goto library2.c goto-cc -o harness2.goto harness2.c goto-cc -o proof2.goto library2.goto harness2.goto cbmc proof2.goto ``` and see a violation of the assertion `bb == b + 1`. ```bash ** Results: library2.c function beta2 [beta2.assertion.1] line 12 assertion bb == b + 1: FAILURE ** 1 of 1 failed (2 iterations) VERIFICATION FAILED ``` We correct this to `bb >= b + 1` and we are done. ## Modeling interfaces Sometimes a function depends on an implementation of an interface or a library API that we don't care to test. A common example is a function that depends on a network communication protocol like HTTP with `send` and `receive` methods. We want our code to work independent of the protocol implementation, so we replace the implementations of the functions in the protocol interface with stubs that over-approximate the behaviors of the implementations. Suppose we have a library [library3.c](examples/harness/library3.c) ```c #include void send(int msg); int receive(); int alpha(int x) { int y = receive(); assert(y > 0); return y+1; } ``` that depends on `send` and `receive` methods from a network interface. We stub out the network interface with [network3.c](examples/harness/network3.c) ```c int receive() { // model receiving an unconstrained integer value from the network int msg; return msg; } void send(int msg) { // model sending an integer over the network (nothing to do) return; } ``` We write the proof harness [harness3.c](examples/harness/harness3.c) ```c int alpha(int x); int main() { int x; alpha(x); } ``` We run CBMC with ```bash goto-cc -o network3.goto network3.c goto-cc -o library3.goto library3.c goto-cc -o harness3.goto harness3.c goto-cc -o proof3.goto network3.goto library3.goto harness3.goto cbmc proof3.goto ``` and we get a failure of the assertion `y > 0`. Suppose the mistake is in our model of the network. Suppose our network only transmits positive integers. Then we can fix our stub of the `receive` method to return not an unconstrained integer but an unconstrained positive integer. We change our stubs for the network to [network3a.c](examples/harness/network3a.c) ```c int receive() { // model receiving an unconstrained POSITIVE integer value from the network int msg; __CPROVER_assume(msg > 0); return msg; } void send(int msg) { // model sending an integer over the network (nothing to do) return; } ``` We run CBMC with ```bash goto-cc -o network3a.goto network3a.c goto-cc -o library3.goto library3.c goto-cc -o harness3.goto harness3.c goto-cc -o proof3.goto network3a.goto library3.goto harness3.goto cbmc proof3.goto ``` and declare victory at the successful verification. ## Allocating data For projects where functions take pointers to data on the heap, we recommend writing functions to allocate unconstrained values on the heap of the appropriate types for use in proof harnesses. Consider a string buffer defined by ```c typedef struct { size_t length; char* buffer; } strbuf; ``` and consider a function that takes a pointer to a string buffer and caches the fifth character in the buffer into a global variable ```c char cache; void cache_fifth_char(strbuf* str) { cache = str->buffer[4]; } ``` Let's try to prove that this function is memory safe (although, obviously, it is not). Memory safety means that we never dereference an invalid pointer and we never try to access data outside the bounds of a valid object. We can do this by calling CBMC with the flags `--pointer-check` and `--bounds-check`. Let's prove this even in the case where malloc can fail and return NULL instead of a pointer. The default model of malloc in CBMC never fails, but we can change to a model of malloc that can fail and return NULL by calling CBMC with the flags `--malloc-may-fail` and `--malloc-fail-null`. The function `cache_fifth_char` takes a pointer to a string buffer on the heap, so let write a function that allocates a string buffer. ```c strbuf* strbuf_allocate(size_t length) { strbuf* str = malloc(sizeof(strbuf)); if (str == NULL) return NULL; str->length = length; str->buffer = malloc(length); return str; } ``` Now we can write a proof harness for the function ```c int main() { size_t len; strbuf* str = strbuf_allocate(len); cache_fifth_char(str); } ``` Pulling all of this together into a single file [strbuf1.c](examples/harness/strbuf1.c), we can run cbmc on this file with ```bash cbmc --pointer-check --bounds-check --malloc-may-fail --malloc-fail-null strbuf1.c ``` and the result is a slew of verification failures including ```bash [cache_fifth_char.pointer_dereference.1] line 7 dereference failure: pointer NULL in str->buffer: FAILURE [cache_fifth_char.pointer_dereference.7] line 7 dereference failure: pointer NULL in str->buffer[(signed long int)4]: FAILURE [cache_fifth_char.pointer_dereference.11] line 7 dereference failure: pointer outside object bounds in str->buffer[(signed long int)4]: FAILURE ``` The first is saying that `str` may be NULL, the second is saying that `str->buffer` may be NULL, and the third is saying that `str->buffer` may not have 5 character in it. We are obviously going to have to make some assumptions about the string buffer before we can prove that the function is memory safe. ## Validating data For projects that require allocating objects of various types on the heap, we recommend writing predicates that say what well-formed objects of these types look like. In the case of our example, a valid pointer to a string buffer must be nonnull, and a valid string buffer must have a buffer that is nonnull. ```c bool strbuf_is_valid(strbuf* str) { if (str == NULL) return false; if (str->buffer == NULL) return false; return true; } ``` With this predicate, we can write a proof harness that assumes the string buffer allocated on the heap is valid before calling the function. We can also assert that the string buffer remains valid after calling the function. And we can assume that the buffer has at least 5 characters in it. In fact, we can prove the this remains true after the function invocation, meaning that the function preserves the validity of its input. ```c int main() { size_t len; strbuf* str = strbuf_allocate(len); __CPROVER_assume(strbuf_is_valid(str)); __CPROVER_assume(str->length > 4); cache_fifth_char(str); __CPROVER_assert(strbuf_is_valid(str), "String buffer remains valid"); __CPROVER_assert(str->length > 4, "String buffer remains length >4"); } ``` Pulling all of this together into [strbuf2.c](examples/harness/strbuf2.c), we can run CBMC ```bash cbmc --pointer-check --bounds-check --malloc-may-fail --malloc-fail-null strbuf2.c ``` and rejoice in the successful validation. ## Proof assumptions Notice that in our last proof harness, we worked hard to make as explicit as possible all of the assumptions required for memory safety to hold: The pointer must be a valid pointer to a valid string buffer with at least five characters in it. It is the responsibility of the programmer to make certain that these assumptions are true when the function is called. All bets are off when the function is called in a weaker context where some of these assumptions are false. The memory safety proof of the function does not hold in that context since the assumptions it is based on do not hold.