`haybale`: Symbolic execution of LLVM IR, written in Rust

haybale is a general-purpose symbolic execution engine written in Rust. It operates on LLVM IR, which allows it to analyze programs written in C/C++, Rust, Swift, or any other language which compiles to LLVM IR. In this way, it may be compared to [KLEE], as it has similar goals, except that haybale is written in Rust and makes some different design decisions. That said, haybale makes no claim of being at feature parity with KLEE.

Okay, but what is a symbolic execution engine?

A symbolic execution engine is a way of reasoning - rigorously and mathematically - about the behavior of a function or program. It can reason about all possible inputs to a function without literally brute-forcing every single one. For instance, a symbolic execution engine like haybale can answer questions like:

Are there any inputs to (some function) that cause it to return 0? What are they?
Is it possible for this loop to execute exactly 17 times?
Can this pointer ever be NULL?

Symbolic execution engines answer these questions by converting each variable in the program or function into a mathematical expression which depends on the function or program inputs. Then they use an SMT solver to answer questions about these expressions, such as the questions listed above.

Getting started

1. Install

haybale is on crates.io, so you can simply add it as a dependency in your Cargo.toml:

toml [dependencies] haybale = "0.3.2"

haybale also depends (indirectly) on the LLVM 9 and Boolector libraries, which must both be available on your system. See the [llvm-sys] or [boolector-sys] READMEs for more details and instructions.

2. Acquire bitcode to analyze

Since haybale operates on LLVM bitcode, you'll need some bitcode to get started. If the program or function you want to analyze is written in C, you can generate LLVM bitcode (*.bc files) with clang's -c and -emit-llvm flags:

bash clang -c -emit-llvm source.c -o source.bc

For debugging purposes, you may also want LLVM text-format (*.ll) files, which you can generate with clang's -S and -emit-llvm flags:

bash clang -S -emit-llvm source.c -o source.ll

If the program or function you want to analyze is written in Rust, you can likewise use rustc's --emit=llvm-bc and --emit=llvm-ir flags.

Note that in order for haybale to print source-location information (e.g., source filename and line number) in error messages and backtraces, the LLVM bitcode will need to include debuginfo. You can ensure debuginfo is included by passing the -g flag to clang, clang++, or rustc when generating bitcode.

3. Create a Project

A haybale [Project] contains all of the code currently being analyzed, which may be one or more LLVM modules. To get started, simply create a Project from a single bitcode file:

rust let project = Project::from_bc_path(&Path::new("/path/to/file.bc"))?;

For more ways to create Projects, including analyzing entire libraries, see the [Project documentation].

4. Use built-in analyses

haybale currently includes two simple built-in analyses: [get_possible_return_values_of_func()], which describes all the possible values a function could return for any input, and [find_zero_of_func()], which finds a set of inputs to a function such that it returns 0. These analyses are provided both because they may be of some use themselves, but also because they illustrate how to use haybale.

For an introductory example, let's suppose foo is the following C function:

c int foo(int a, int b) { if (a > b) { return (a-1) * (b-1); } else { return (a + b) % 3 + 10; } }

We can use find_zero_of_func() to find inputs such that foo will return 0:

rust match find_zero_of_func("foo", &project, Config::default()) { Ok(None) => println!("foo can never return 0"), Ok(Some(inputs)) => println!("Inputs for which foo returns 0: {:?}", inputs), Err(e) => panic!("{}", e), // use the pretty Display impl for errors }

Writing custom analyses

haybale can do much more than just describe possible function return values and find function zeroes. In this section, we'll walk through how we could find a zero of the function foo above without using the built-in find_zero_of_func(). This will illustrate how to write a custom analysis using haybale.

ExecutionManager

All analyses will use an [ExecutionManager] to control the progress of the symbolic execution. In the code snippet below, we call [symex_function()] to create an ExecutionManager which will analyze the function foo - it will start at the top of the function, and end when the function returns. In between, it will also analyze any functions called by foo, as necessary and depending on the [Config] settings.

rust let mut em = symex_function("foo", &project, Config::<BtorBackend>::default());

Here it was necessary to not only specify the default haybale configuration, as we did when calling find_zero_of_func(), but also what "backend" we want to use. The default BtorBackend should be fine for most purposes.

Paths

The ExecutionManager acts like an Iterator over paths through the function foo. Each path is one possible sequence of control-flow decisions (e.g., which direction do we take at each if statement) leading to the function returning some value. The function foo in this example has two paths, one following the "true" branch and one following the "false" branch of the if.

Let's examine the first path through the function:

rust let result = em.next().expect("Expected at least one path");

In the common case, result contains the function return value on this path, as a Boolector [BV] (bitvector) wrapped in the [ReturnValue] enum. Since we know that foo isn't a void-typed function (and won't throw an exception or abort), we can simply unwrap the ReturnValue to get the BV:

rust let retval = match result { Ok(ReturnValue::Return(r)) => r, Ok(ReturnValue::ReturnVoid) => panic!("Function shouldn't return void"), Ok(ReturnValue::Throw(_)) => panic!("Function shouldn't throw an exception"), Ok(ReturnValue::Abort) => panic!("Function shouldn't panic or exit()"), ...

result could also be an Err describing an [Error] which was encountered while processing the path. In this case, we could just ignore the error and keep calling next() to try to find paths which didn't have errors. Or we could get information about the error like this:

rust ... Err(e) => panic!("{}", em.state().full_error_message_with_context(e)), };

This gets information about the error from the program State, which we'll discuss next. But for the rest of this tutorial, we'll assume that we got the Ok result, and at this point retval is a BV representing the function return value on the first path.

States

For each path, the [ExecutionManager] provides not only the final result of the path (either aReturnValue or an Error), but also the final program [State] at the end of that path. We can get immutable access to the State with state(), or mutable access with mut_state().

rust let state = em.mut_state(); // the final program state along this path

To test whether retval can be equal to 0 in this State, we can use state.bvs_can_be_equal():

rust let zero = state.zero(32); // The 32-bit constant 0 if state.bvs_can_be_equal(&retval, &zero)? { println!("retval can be 0!"); }

Getting solutions for variables

If retval can be 0, let's find what values of the function parameters would cause that. First, we'll add a constraint to the State requiring that the return value must be 0:

rust retval._eq(&zero).assert();

and then we'll ask for solutions for each of the parameters, given this constraint:

```rust // Get a possible solution for the first parameter. // In this case, from looking at the text-format LLVM IR, we know the variable // we're looking for is variable #0 in the function "foo". let a = state.getasolutionforirname(&String::from("foo"), Name::from(0))? .expect("Expected there to be a solution") .as_u64() .expect("Expected solution to fit in 64 bits");

// Likewise the second parameter, which is variable #1 in "foo" let b = state.getasolutionforirname(&String::from("foo"), Name::from(1))? .expect("Expected there to be a solution") .as_u64() .expect("Expected solution to fit in 64 bits");

println!("Parameter values for which foo returns 0: a = {}, b = {}", a, b); ```

Alternately, we could also have gotten the parameter BVs from the ExecutionManager like this:

```rust let abv = em.parambvs()[0].clone(); let bbv = em.parambvs()[1].clone();

let a = em.state().getasolutionforbv(&abv)? .expect("Expected there to be a solution") .asu64() .expect("Expected solution to fit in 64 bits");

let b = em.state().getasolutionforbv(&bbv)? .expect("Expected there to be a solution") .asu64() .expect("Expected solution to fit in 64 bits");

println!("Parameter values for which foo returns 0: a = {}, b = {}", a, b); ```

Documentation

Full documentation for haybale can be found here, or of course you can generate local documentation with cargo doc --open.

Compatibility

Currently, haybale only supports LLVM 9. A version of haybale supporting LLVM 8 is available on the llvm-8 branch of this repo; it is approximately at feature parity with haybale 0.2.1, and will likely be stuck at that point indefinitely unless there is demand for additional backported features.

haybale works on stable Rust, and requires Rust 1.40+.

Under the hood

haybale is built using the Rust [llvm-ir] crate and the [Boolector] SMT solver (via the Rust [boolector] crate).

Changelog

Version 0.3.2 (Feb 28, 2020)

New option [Config.max_callstack_depth] allows you to limit the callstack depth for an analysis - automatically ignoring calls of LLVM functions which would exceed that callstack depth. The default for this setting is no limit, matching the previous behavior of haybale.
New option [Config.max_memcpy_length] allows you to limit the maximum size of memcpy, memset, and memmove operations. The default for this setting is no limit, matching the previous behavior of haybale.
New method [FunctionHooks::add_default_hook()] allows you to supply a "default hook" which will be used when no other definition or hook is found for a function call. If no default hook is provided, this will result in a FunctionNotFound error, just as it did previously.
Performance improvements for analyzing calls of function pointers.
Improved a few error messages.

Version 0.3.1 (Feb 5, 2020)

Fix some broken links in the README and docs. No functional changes.

Version 0.3.0 (Feb 5, 2020)

Solver timeouts: - New setting [Config.solver_query_timeout] controls the maximum amount of time haybale will spend on a single solver query before returning Error::SolverError. This setting defaults to 300 seconds (5 minutes). The setting can also be disabled entirely, which results in the same behavior as previous versions of haybale (no time limit on solver queries).

Error handling: - The errors returned by ExecutionManager.next() are now haybale::Errors instead of Strings, allowing callers to more easily handle different kinds of errors different ways. To get a string representation of the Error, .to_string() gives the short description, while [State.full_error_message_with_context()] gives the full description which previously was returned by ExecutionManager.next(). The usage example in the README has been updated accordingly. - The toplevel function [find_zero_of_func()] now returns a Result, with the error type being String. - New setting [Config.squash_unsats] controls whether Error::Unsats are silently squashed (the default behavior, and the behavior of previous versions of haybale), or returned to the user. For more details, see the docs on that setting.

Logging, error messages, backtraces, etc: - haybale now prints source-location information (e.g., source filename and line number) in error messages and backtraces when it is available. Similarly, the HAYBALE_DUMP_PATH environment variable now has the options LLVM, SRC, and BOTH. For more details on all of this, see [Config.print_source_info]. - You can also now disable printing the LLVM module name along with LLVM location info in error messages, backtraces, path dumps, and log messages. For more details, see [Config.print_module_name]. - haybale will now by default autodetect when C++ or Rust demangling is appropriate for the Project, unless a different setting is chosen in [Config.demangling]. - Numeric constants representing BV values in log messages, HAYBALE_DUMP_VARS dumps, etc are now all printed in hexadecimal (previously binary, or an inconsistent mix of binary and hexadecimal).

Function hooks and intrinsics: - Built-in support for LLVM arithmetic-with-overflow intrinsics. - Built-in support for LLVM saturating-arithmetic intrinsics. - Built-in support for the llvm.assume intrinsic, with an associated setting [Config.trust_llvm_assumes]. - Built-in support for the llvm.bswap intrinsic with argument sizes 48 or 64 bits (previously only supported 16 or 32 bits). - Default hooks for a number of Rust standard-library functions which always panic, such as core::result::unwrap_failed(). - New module hook_utils contains the implementations of memset and memcpy used by the corresponding built-in hooks. These are now publically available for use in custom hooks for other functions.

Changes to data structures and traits: - The Location and PathEntry structs have been refactored to include source-location information when it is available, to be capable of indicating basic block terminators in addition to normal instructions, and to support some internal refactoring. - The [backend::BV] trait has a new required method, get_solver(), which returns a SolverRef of the appropriate type. (This is similar to the same method on the backend::Memory trait.) - Saturating-arithmetic methods (signed and unsigned addition and subtraction) are now available on [backend::BV], with default implementations in terms of the other trait methods. That means that these come "for free" once the required trait methods are implemented. - zero_extend_to_bits() and sign_extend_to_bits() are also now available as trait methods on [backend::BV], with default implementations in terms of the other trait methods. Previously they were private utility functions in haybale. - Many other structures have had minor changes and improvements, including some small breaking changes.

Compatibility: - Updated boolector dependency to crate version 0.3.0, which requires Boolector version 3.1.0 (up from 3.0.0). - This version of haybale now requires Rust 1.40+, up from 1.36+ for previous versions of haybale.

Version 0.2.1 (Jan 15, 2020)

New HAYBALE_DUMP_PATH and HAYBALE_DUMP_VARS environment-variable options
- HAYBALE_DUMP_PATH: if set to 1, then on error, haybale will print a description of the path to the error: every LLVM basic block touched from the top of the function until the error location, in order.
- HAYBALE_DUMP_VARS: if set to 1, then on error, haybale will print the latest value assigned to each variable in the function containing the error.
New setting Config.demangling allows you to apply C++ or Rust demangling to function names in error messages and backtraces
Support hooking calls to inline assembly, with some limitations inherited from [llvm-ir] (see comments on [FunctionHooks::add_inline_asm_hook()])
Built-in support for (the most common cases of) the llvm.bswap intrinsic
Other tiny tweaks - e.g., downgrade one panic to a warning

Version 0.2.0 (Jan 8, 2020)

Support LLVM extractvalue and insertvalue instructions
Support LLVM invoke, resume, and landingpad instructions, and thus C++ throw/catch. Also provide built-in hooks for some related C++ ABI functions such as __cxa_throw(). This support isn't perfect, particularly surrounding the matching of catch blocks to exceptions: haybale may explore some additional paths which aren't actually valid. But all actually valid paths should be found and explored correctly.
Since functions can be called not only with the LLVM call instruction but also with the LLVM invoke instruction, function hooks now receive a &dyn IsCall object which may represent either a call or invoke instruction.
haybale now uses LLVM 9 rather than LLVM 8. See the "Compatibility" section in the README.
Improvements for Projects containing C++ and/or Rust code:
- For the function-name arguments to [symex_function()], [get_possible_return_values_of_func()], [find_zero_of_func()], and [Project::get_func_by_name()], you may now pass either the (mangled) function name as it appears in LLVM (as was supported previously), or the demangled function name. That is, you can pass in "foo::bar" rather than "_ZN3foo3barE".
- Likewise, you may add function hooks based on the demangled name of the hooked function. See [FunctionHooks::add_cpp_demangled()] and [FunctionHooks::add_rust_demangled()].
- Also, llvm-ir versions 0.3.3 and later contain an important bugfix for parsing LLVM bitcode generated by rustc. haybale 0.2.0 uses llvm-ir 0.4.1.
The [ReturnValue] enum now has additional options Throw, indicating an uncaught exception, and Abort, indicating a program abort (e.g. Rust panic, or call to C exit()).
Relatedly, haybale now has built-in hooks for the C exit() function and for Rust panics (and for a few more LLVM intrinsics).
haybale also now contains a built-in [generic_stub_hook] and [abort_hook] which you can supply as hooks for any functions which you want to ignore the implementation of, or which always abort, respectively. See docs on the [function_hooks] module.
[Config.initial_mem_watchpoints] is now a HashMap instead of a HashSet of pairs.

Version 0.1.3 (Jan 1, 2020)

Memory watchpoints: specify a range of memory addresses, and get a log message for any memory operation which reads or writes any data in that range. See [State::add_mem_watchpoint()].
Convenience methods on [State] for constructing constant-valued BVs (rather than having to use the corresponding methods on BV and pass state.solver): bv_from_i32(), bv_from_u32(), bv_from_i64(), bv_from_u64(), bv_from_bool(), zero(), one(), and ones().
Some internal code refactoring to prepare for 0.2.0 features

Version 0.1.2 (Dec 18, 2019)

New method [Project::get_inner_struct_type_from_named()] which handles opaque struct types by searching the entire Project for a definition of the given struct
Support memory reads of size 1-7 bits (in particular, reads of LLVM i1)
Performance optimization: during State initialization, global variables are now only allocated, and not initialized until first use (lazy initialization). This gives the SMT solver fewer memory writes to think about, and helps especially for large Projects which may contain many global variables that won't actually be used in a given analysis.
Minor bugfixes and improved error messages

Version 0.1.1 (Nov 26, 2019)

Changes to README text only; no functional changes.

Version 0.1.0 (Nov 25, 2019)

Initial release!

haybale: Symbolic execution of LLVM IR, written in Rust