Necessist

Run tests with statements and method calls removed to help identify broken tests

Necessist currently supports Anchor (TS), Foundry, Go, Hardhat (TS), and Rust.

Contents

Installation

System requirements:

Install pkg-config and sqlite3 development files on your system, e.g., on Ubuntu:

sh sudo apt install pkg-config libsqlite3-dev

Install Necessist from [crates.io]:

sh cargo install necessist

Install Necessist from [github.com]:

sh cargo install --git https://github.com/trailofbits/necessist --branch release

Overview

Necessist iteratively removes statements and method calls from tests and then runs them. If a test passes with a statement or method call removed, it could indicate a problem in the test. Or worse, it could indicate a problem in the code being tested.

Example

This example is from [rust-openssl]. The verify_untrusted_callback_override_ok test checks that a failed certificate validation can be overridden by a callback. But if the callback were never called (e.g., because of a failed connection), the test would still pass. Necessist reveals this fact by showing that the test passes without the call to set_verify_callback:

```rust

[test]

fn verifyuntrustedcallbackoverrideok() { let server = Server::builder().build();

let mut client = server.client();
client
    .ctx()
    .set_verify_callback(SslVerifyMode::PEER, |_, x509| { //
        assert!(x509.current_cert().is_some());           // Test passes without this call
        true                                              // to `set_verify_callback`.
    });                                                   //

client.connect();

} ```

Following this discovery, a flag was [added to the test] to record whether the callback is called. The flag must be set for the test to succeed:

```rust

[test]

fn verifyuntrustedcallbackoverrideok() { static CALLED_BACK: AtomicBool = AtomicBool::new(false); // Added

let server = Server::builder().build();

let mut client = server.client();
client
    .ctx()
    .set_verify_callback(SslVerifyMode::PEER, |_, x509| {
        CALLED_BACK.store(true, Ordering::SeqCst);        // Added
        assert!(x509.current_cert().is_some());
        true
    });

client.connect();
assert!(CALLED_BACK.load(Ordering::SeqCst));              // Added

} ```

Comparison to conventional mutation testing

Click to expand

Conventional mutation testing tries to identify gaps in test coverage, whereas Necessist tries to identify bugs in existing tests.

Conventional mutation testing tools (such a [universalmutator]) randomly inject faults into source code, and see whether the code's tests still pass. If they do, it could mean the code's tests are inadequate.

Notably, conventional mutation testing is about finding deficiencies in the set of tests as a whole, not in individual tests. That is, for any given test, randomly injecting faults into the code is not especially likely to reveal bugs in that test. This is unfortunate since some tests are more important than others, e.g., because ensuring the correctness of some parts of the code is more important than others.

By comparison, Necessist's approach of iteratively removing statements and method calls does target individual tests, and thus can reveal bugs in individual tests.

Of course, there is overlap is the sets of problems the two approaches can uncover, e.g., a failure to find an injected fault could indicate a bug in a test. Nonetheless, for the reasons just given, we see the two approaches as complementary, not competing.

Theoretical motivation

Click to expand

The following criterion (*) comes close to describing the statements that Necessist aims to remove:

The notion that (*) tries to capture is: a statement that affects a subsequently asserted condition. In this section, we explain and motivate this choice, and briefly discuss alternatives. For concision, we focus on statements, but the remarks in this section apply to method calls as well.

Consider a test through the lens of [predicate transformer semantics]. A test is a function with no inputs or outputs. Thus, an alternative procedure for determining whether a test passes is the following. Starting with True, iteratively work backwards through the test's statements, computing the weakest precondition of each. If the precondition arrived at for the test's first statement is True, then the test passes. If the precondition is False, the test fails.

Now, imagine we were to apply this procedure, and consider a statement S that violates (*). We argue that it might not make sense to remove S:

Conversely, consider a statement S that satisfies (*). Here is why it might make sense to remove S. Think of S as shifting the set of valid environments, rather than constraining them. More precisely, if S's weakest precondition P does not imply Q, and if Q is satisfiable, the there is an assignment to P and Q's free variables that satisfies both P and Q. If such an assignment results from each environment in which S is actually executed, then the necessity of S is called into question.

The main utility of (*) is in helping to select the statements that Necessist ignores. That is, if we imagine a predicate transformer semantics for one of Necessist's supported languages, and a statement S in that language, we can ask: would S satisfy (*)? If not, then then Necessist should likely ignore S.

But (*) has other nice consequences. For example, the rule that the last statement in a test should be ignored follows from (*). To see this, note the such a statement's postcondition Q is always True. Thus, if the statement doesn't change the context, then its weakest precondition necessarily implies Q.

Having said all this, (*) doesn't quite capture what Necessist actually does. Consider a statement like x -= 1. Necessist will remove such a statement unconditionally, but (*) says maybe Necessist shouldn't. Assuming [overflow checks] are enabled, computing this statement's weakest precondition would look something like the following:

{ Q[(x - 1)/x] ^ x >= 1 } x -= 1; { Q }

Note that x -= 1 does not change the context, and that Q[(x - 1)/x] ^ x >= 1 could imply Q. For example, if Q does not contain x, then Q[(x - 1)/x] = Q and Q ^ x >= 1 implies Q.

A question one can then ask is: should Necessist remove this statement? Put another way, should Necessist's current behavior be adjusted, or should (*) be adjusted?

One way to look at this question is: which statements are worth removing, i.e., which statements are "interesting?" As implied above, (*) considers a statement "interesting" if it affects a subsequently asserted condition. Agreeing with this notion and that (*) adequately captures it are reasons to keep (*) and adjust Necessist's behavior.

But there are other possible, useful definitions of "interesting statement" upon which one could base an argument for adjusting (*). The following example is due to @2over12. Instead of weakest preconditions, one could consider [strongest postconditions]. For example, computing the strongest postcondition of x -= 1 would look something like the following:

{ P } x -= 1; { (exists x')[P[x'/x] ^ x' >= 1 ^ x = x' - 1] }

One could then consider a statement "interesting" if its strongest postcondition contains "interesting clauses" as determined by heuristics. @2over12 notes that a common source of bugs in tests is unintended side effects (e.g., if x -= 1 were unintended). As already noted, (*) might not catch such bugs, but the just mentioned strongest postcondition scheme might.

Other possible, useful definitions of "interesting statement" could involve frameworks besides [Hoare logic] entirely.

To be clear, Necessist does not apply (*) formally, e.g., Necessist does not actually compute weakest preconditions. The current role of (*) is to help guide which statements Necessist should ignore, and (*) seems to do well in that role. As such, we leave revision of (*) to future work.

Usage

``` Usage: necessist [OPTIONS] [TEST_FILES]... [-- ...]

Arguments: [TEST_FILES]... Test files to mutilate (optional) [ARGS]... Additional arguments to pass to each test command

Options: --allow Silence ; --allow all silences all warnings --default-config Create a default necessist.toml file in the project's root directory --deny Treat as an error; --deny all treats all warnings as errors --dump Dump sqlite database contents to the console --dump-candidates Dump removal candidates and exit (for debugging) --framework Assume testing framework is [possible values: anchor-ts, auto, foundry, go, hardhat-ts, rust] --no-dry-run Do not perform dry runs --no-sqlite Do not output to an sqlite database --quiet Do not output to the console --reset Discard sqlite database contents --resume Resume from the sqlite database --root Root directory of the project under test --timeout Maximum number of seconds to run any test; 60 is the default, 0 means no timeout --verbose Show test outcomes besides passed -h, --help Print help -V, --version Print version ```

Output

By default, Necessist outputs to the console only when tests pass. Passing --verbose causes Necessist to instead output all of the removal outcomes below.

| Outcome | Meaning (With the statement/method call removed...) | | -------------------------------------------- | --------------------------------------------------- | | passed | The test(s) built and passed. | | timed-out | The test(s) built but timed-out. | | failed | The test(s) built but failed. | | nonbuildable | The test(s) did not build. |

By default, Necessist outputs to both the console and to an sqlite database. For the latter, a tool like [sqlitebrowser] can be used to filter/sort the results.

Details

Generally speaking, Necessist will not attempt to remove a statement if it is one the following:

Similarly, Necessist will not attempt to remove a method call if:

Also, for some frameworks, certain statements and methods are ignored. Click on a framework to see its specifics.

Anchor TS

Ignored functions

Ignored methods

Foundry

In addition to the below, the Foundry framework ignores:

Ignored functions

Go

In addition to the below, the Go framework ignores:

Ignored methods*

* This list is based primarily on [testing.T]'s methods. However, some methods with commonplace names are omitted to avoid colliding with other types' methods.

Hardhat TS

The ignored functions and methods are the same as for Anchor TS above.

Rust

Ignored macros

Ignored methods*

* This list is essentially the watched trait and inherent methods of Dylint's [unnecessary_conversion_for_trait] lint, with the following additions:

Configuration files

A configuration file allows one to tailor Necessist's behavior with respect to a project. The file must be named necessist.toml, appear in the project's root directory, and be [toml] encoded. The file may contain one more of the options listed below.

Patterns

A pattern is a string composed of letters, numbers, ., _, or *. Each character, other than *, is treated literally and matches itself only. A * matches any string, including the empty string.

The following are examples of patterns:

Notes:

Paths

A path is a sequence of identifiers separated by .. Consider this example (from [Chainlink]):

sol operator.connect(roles.oracleNode).signer.sendTransaction({ to: operator.address, data, }),

In the above, operator.connect and signer.sendTransaction are paths.

Note, however, that paths like operator.connect are ambiguous:

By default, Necessist ignores such a path if it matches either an ignored_functions or ignored_macros pattern. Setting the ignored_path_disambiguation option above to Function or Method causes Necessist ignore the path only if it matches an ignored_functions or ignored_macros pattern (respectively).

Limitations

Semantic versioning policy

We reserve the right to change what syntax Necessist ignores by default, and to consider such changes non-breaking.

Goals

References

License

Necessist is licensed and distributed under the AGPLv3 license. Contact us if you're looking for an exception to the terms.