Iai-Callgrind

High-precision and consistent benchmarking framework/harness for Rust

Iai-Callgrind is a benchmarking framework and harness that uses Callgrind to provide extremely accurate and consistent measurements of Rust code, making it perfectly suited to run in environments like a CI.

This crate started as a fork of the great Iai crate rewritten to use Valgrind's Callgrind instead of Cachegrind but also adds a lot of other improvements and features.

Table of Contents

Features

Precision: High-precision measurements allow you to reliably detect very small optimizations of your code
Consistency: Iai-Callgrind can take accurate measurements even in virtualized CI environments
Performance: Since Iai-Callgrind only executes a benchmark once, it is typically a lot faster to run than benchmarks measuring the execution and wall time
Regression: Iai-Callgrind reports the difference between benchmark runs to make it easy to spot detailed performance regressions and improvements.
Profiling: Iai-Callgrind generates a Callgrind profile of your code while benchmarking, so you can use Callgrind-compatible tools like callgrind_annotate or the visualizer kcachegrind to analyze the results in detail
Stable-compatible: Benchmark your code without installing nightly Rust

Installation

In order to use Iai-Callgrind, you must have Valgrind installed. This means that Iai-Callgrind cannot be used on platforms that are not supported by Valgrind.

To start with Iai-Callgrind, add the following to your Cargo.toml file:

toml [dev-dependencies] iai-callgrind = "0.5.0"

To be able to run the benchmarks you'll also need the iai-callgrind-runner binary installed somewhere in your $PATH, for example with

shell cargo install --version 0.5.0 iai-callgrind-runner

There's also the possibility to install the binary somewhere else and point the IAI_CALLGRIND_RUNNER environment variable to the absolute path of the iai-callgrind-runner binary like so:

shell cargo install --version 0.5.0 --root /tmp iai-callgrind-runner IAI_CALLGRIND_RUNNER=/tmp/bin/iai-callgrind-runner cargo bench --bench my-bench

When updating the iai-callgrind library, you'll also need to update iai-callgrind-runner and vice-versa or else the benchmark runner will exit with an error.

Benchmarking

iai-callgrind can be used to benchmark libraries or binaries. Library benchmarks benchmark functions and methods of a crate and binary benchmarks benchmark the executables of a crate. The different benchmark types cannot be intermixed in the same benchmark file but having different benchmark files for library and binary benchmarks is no problem. More on that in the following sections. For a quickstart and examples of benchmarking libraries see the Library Benchmark Section and for executables see the Binary Benchmark Section.

Library Benchmarks

Use this scheme if you want to micro-benchmark specific functions of your crate's library.

Quickstart

Add

toml [[bench]] name = "my_benchmark" harness = false

to your Cargo.toml file and then create a file with the same name in benches/my_benchmark.rs with the following content:

```rust use iaicallgrind::{blackbox, main};

fn fibonacci(n: u64) -> u64 { match n { 0 => 1, 1 => 1, n => fibonacci(n-1) + fibonacci(n-2), } }

[inline(never)] // required for benchmarking functions

fn iaibenchmarkshort() -> u64 { fibonacci(black_box(10)) }

[inline(never)] // required for benchmarking functions

fn iaibenchmarklong() -> u64 { fibonacci(black_box(30)) }

main!(iaibenchmarkshort, iaibenchmarklong); ```

Note that it is important to annotate the benchmark functions with #[inline(never)] or else the rust compiler will most likely try to optimize this function and inline it. Callgrind is function (name) based and uses function calls within the benchmarking function to collect counter events. Not inlining this function serves the additional purpose to reduce influences of the surrounding code on the benchmark function.

Now you can run this benchmark with cargo bench --bench my_benchmark in your project root and you should see something like this:

text my_benchmark::bench_fibonacci_short Instructions: 1727 L1 Data Hits: 621 L2 Hits: 0 RAM Hits: 1 Total read+write: 2349 Estimated Cycles: 2383 my_benchmark::bench_fibonacci_long Instructions: 26214727 L1 Data Hits: 9423880 L2 Hits: 0 RAM Hits: 2 Total read+write: 35638609 Estimated Cycles: 35638677

In addition, you'll find the callgrind output in target/iai/my_benchmark, if you want to investigate further with a tool like callgrind_annotate. Now, if running the same benchmark again, the output will report the differences between the current and the previous run. Say you've made change to the fibonacci function, then you might see something like this:

text my_benchmark::bench_fibonacci_short Instructions: 2798 (+62.01506%) L1 Data Hits: 1006 (+61.99678%) L2 Hits: 0 (No Change) RAM Hits: 1 (No Change) Total read+write: 3805 (+61.98382%) Estimated Cycles: 3839 (+61.09945%) my_benchmark::bench_fibonacci_long Instructions: 16201590 (-38.19661%) L1 Data Hits: 5824277 (-38.19661%) L2 Hits: 0 (No Change) RAM Hits: 2 (No Change) Total read+write: 22025869 (-38.19661%) Estimated Cycles: 22025937 (-38.19654%)

Examples

For examples see also the benches folder.

Skipping setup code

Usually, all function calls in the benchmark function itself are attributed to the event counts. It's possible to pass additional arguments to Callgrind and something like below will eliminate the setup code from the final metrics:

```rust use iaicallgrind::{blackbox, main}; use my_library;

[exportname = "somespecialid::expensivesetup"]

[inline(never)]

fn expensive_setup() -> Vec { // some expensive setup code to produce a Vec }

[inline(never)]

fn test() { mylibrary::calltofunction(blackbox(expensive_setup())); }

main!( callgrindargs = "toggle-collect=somespecialid::expensivesetup"; functions = test ); ```

and then run the benchmark for example with

shell cargo bench --bench my_bench

See also Skip setup code example for an in-depth explanation.

Binary Benchmarks

Use this scheme to benchmark one or more binaries of your crate. If you really like to it's possible to benchmark any executable file in the PATH or any executable specified with an absolute path. This may be useful if you want to compare the runtime of your crate with an existing tool.

It's also possible to run functions of the benchmark file before and after all benchmarks or to setup and teardown any benchmarked binary.

Temporary Workspace and other important default behavior

Per default, all binary benchmarks and the before, after, setup and teardown functions are executed in a temporary directory. See the sandbox for a deeper explanation and how to control and change this behavior.

Also, the environment variables of benchmarked binaries are cleared before the benchmark is run. See also opts how to change this behavior.

Quickstart

Assuming the name of the crate's binary is benchmark-tests, add

toml [[bench]] name = "my_binary_benchmark" harness = false

to your Cargo.toml file and then create a file with the same name in benches/my_binary_benchmark.rs with the following content:

```rust use iai_callgrind::main;

/// This method is run before a benchmark

[inline(never)] // required

fn setup() { println!("setup benchmark-tests") }

/// This method is run after a benchmark

[inline(never)] // required

fn teardown() { println!("teardown benchmark-tests"); }

main!( setup = setup; teardown = teardown; run = cmd = "benchmark-tests", id = "two_args", args = ["one", "two"]; ); ```

You're ready to run the benchmark with cargo bench --bench my_binary_benchmark. Although an id is optional it is good practice to specify it. The rest of the procedure is the same as with Library Benchmarks.

Description

Binary Benchmark Arguments
- run
  - cmd
  - args
  - opts
  - envs
- sandbox
- options
- before, after, setup, teardown
- fixtures

The main macro for binary benchmarks allows the following top-level arguments:

rust main!( options = "--callgrind-argument=yes"; before = function_running_before_all_benchmarks; after = function_running_after_all_benchmarks; setup = function_running_before_any_benchmark; teardown = function_running_after_any_benchmark; sandbox = true; fixtures = "path/to/fixtures"; run = cmd = "benchmark-tests", args = []; )

Here, benchmark-tests is an example of the name of the binary of a crate and it is assumed that the function_running_before_all_benchmarks ... functions are defined somewhere in the same file of the main macro. All top-level arguments must be separated by a ;. However, only run is mandatory. All other top-level arguments (like options, setup etc.) are optional.

`run` (Mandatory)

The run argument can be specified multiple times separated by a ; but must be given at least once. It takes the following arguments:

`cmd` (Mandatory)

This argument is allowed only once and specifies the name of one of the executables of the benchmarked crate. The path of the executable is discovered automatically, so the name of the [[bin]] as specified in the crate's Cargo.toml file is sufficient. The auto discovery supports running the benchmarks with different profiles.

Although not the main purpose of iai-callgrind, it's possible to benchmark any executable in the PATH or specified with an absolute path.

`args` (Mandatory)

The args argument must be specified at least once containing the arguments for the benchmarked cmd. It can be an empty array [] to run to the cmd without any arguments. Each args argument can optionally be named with an id and it is good practice to do so with a short and descriptive string.

Specifying args multiple times (separated by a ,) like so:

rust main!( run = cmd = "benchmark-tests", id = "long", args = ["something"], id = "short", args = ["other"] )

is a short-hand for specifying run with the same cmd, opts and envs arguments multiple times:

rust main!( run = cmd = "benchmark-tests", id = "long", args = ["something"]; run = cmd = "benchmark-tests", id = "short", args = ["other"] )

The output of a bench run with ids could look like:

text test_bin_bench long:benchmark-tests something Instructions: 322637 (No Change) L1 Data Hits: 106807 (No Change) L2 Hits: 708 (No Change) RAM Hits: 3799 (No Change) Total read+write: 433951 (No Change) Estimated Cycles: 565949 (No Change) test_bin_bench short:benchmark-tests other Instructions: 155637 (No Change) L1 Data Hits: 106807 (No Change) L2 Hits: 708 (No Change) RAM Hits: 3799 (No Change) Total read+write: 433951 (No Change) Estimated Cycles: 565949 (No Change)

If no ids are specified each benchmark will be enumerated and shown with a simple number. The same is true for the file name of the output of callgrind.

`opts` (Optional)

opts is optional and can be specified once for every run and cmd:

rust main!( run = cmd = "benchmark-tests", opts = Options::default().env_clear(false), args = ["something"]; )

Here, env_clear(false) specifies to keep the environment variables when running the cmd with callgrind.

The currently available options are:

env_clear(bool): If true clear the environment variables before running the benchmark (Default: true)
current_dir(path: PathBuf): Set the working directory of the cmd (Default: Unchanged). If running the benchmark with sandbox = true, and path is relative then this new directory must be contained in the sandbox.
entry_point(&str): Per default the counting of events starts right at the start of the binary and stops when it finished execution. It may desirable to start the counting for example when entering the main function (but can be any function) and stop counting when leaving the main function of the executable. The entry_point could look like benchmark_tests::main for a binary with the name benchmark-tests (Note that hyphens are replaced with an underscore by callgrind). See also the documentation of toggle-collect and Limiting the range of collected events
exit_with(ExitWith): Usually, if one benchmark exits with a non-zero exit code, the whole benchmark run fails and stops. If you expect the exit code of your benchmarked binary to be different from 0, you can set the expected exit code with this option. This option takes an ExitWith enum:
- ExitWith::Success
- ExitWith::Failure
- ExitWith::Code(i32)

For example $ /bin/stat 'file does not exist' exits with 1 if the path of the argument does not exist and specifying ExitWith::Code(1) (or ExitWith::Failure) let's the benchmark pass:

rust main!( run = cmd = "/bin/stat", opts = Options::default().exit_with(ExitWith::Code(1)), args = ["file does not exist"]; )

`envs` (Optional)

envs may be used to set environment variables available in the cmd. This argument is optional and can be specified once for every cmd. There must be at least one KEY=VALUE pair or KEY present in the array:

rust main!( run = cmd = "benchmark-tests", envs = ["MY_VAR=SOME_VALUE", "MY_OTHER_VAR=VALUE"], args = ["something"]; )

Environment variables specified in the envs array are usually KEY=VALUE pairs. But, if env_clear is true (what is the default), single KEYs are environment variables to pass-through to the cmd. The following will pass-through the PATH variable although the environment is cleared (here given explicitly with the Options although it is the default)

rust main!( run = cmd = "benchmark-tests", envs = ["PATH"], opts = Options::default().env_clear(true), args = []; )

Pass-through environment variables are ignored if they don't exist in the root environment.

`sandbox` (Optional)

Per default, all binary benchmarks and the before, after, setup and teardown functions are executed in a temporary directory.

rust main!( sandbox = true; run = cmd = "benchmark-tests", opts = Options::default().env_clear(false), args = ["something"]; )

This temporary directory will be created and selected before the before function is run and removed after the after function has finished. The fixtures argument let's you copy your fixtures into that directory, so you have access to your fixtures. If you want to access other directories within the benchmarked package's directory, you need to specify absolute paths or set the sandbox argument to false.

Another reason for using a temporary directory as workspace is, that the length of the path where a benchmark is executed may have an influence on the benchmark results. For example, running the benchmark in your repository /home/me/my/repository and someone else's repository located under /home/someone/else/repository may produce different results only because the length of the first path is shorter. To run benchmarks as deterministic as possible across different systems, the length of the path should be the same wherever the benchmark is executed. This crate ensures this property by using the tempfile crate which creates the temporary directory in /tmp with a random name like /tmp/.tmp12345678. This ensures that the length of the directory will be the same on all unix hosts where the benchmarks are run.

`options` (Optional)

A , separated list of strings which contain options for all callgrind invocations and therefore benchmarked cmds (Including benchmarked before, after, setup and teardown functions).

rust main!( options = "--zero-before=benchmark_tests::main"; run = cmd = "benchmark-tests", args = []; )

See also Passing arguments to callgrind and the documentation of Callgrind

`before`, `after`, `setup`, `teardown` (Optional)

Each of the before, after, setup and teardown top-level arguments is optional. If given, this argument must specify a function of the benchmark file. These functions are meant to setup and cleanup the benchmarks. Each function is invoked at a different stage of the benchmarking process.

before: This function is run once before all benchmarked cmds
after: This function is run once after all benchmarked cmds
setup: This function is run once before any benchmarked cmd
teardown: This function is run once after any benchmarked cmd

```rust use iai_callgrind::main;

[inline(never)] // necessary

fn setupmybenchmark() { // For example, create a file }

[inline(never)] // necessary

fn teardownmybenchmark() { // For example, delete a file }

main!( setup = setupmybenchmark; teardown = teardownmybenchmark; run = cmd = "benchmark-tests", args = []; ) ```

Per default, these functions are not benchmarked, but this behavior can be changed by specifying the optional bench argument with a value of true after the function name.

rust main!( setup = setup_my_benchmark, bench = true; run = cmd = "benchmark-tests", args = []; )

Note that setup and teardown functions are benchmarked only once the first time they are invoked, much like the before and after functions. However, these functions are run as usual before or after any benchmark. Benchmarked before, after etc. functions follow the same rules as benchmark functions of library benchmarks.

`fixtures` (Optional)

The fixtures argument specifies a path to a directory containing fixtures which you want to be available for all benchmarks and the before, after, setup and teardown functions. Per default, the fixtures directory will be copied as is into the workspace directory of the benchmark and following symlinks is switched off. The fixtures argument takes an additional argument follow_symlinks = bool. If set to true and your fixtures directory contains symlinks, these symlinks are resolved and instead of the symlink the target file or directory will be copied into the fixtures directory.

Relative paths are interpreted relative to the benchmarked package. In a multi-package workspace this'll be the package name of the benchmark. Otherwise, it'll be the workspace root.

rust main!( setup = setup_my_benchmark; fixtures = "my_fixtures"; run = cmd = "benchmark-tests", args = []; )

Here, the directory my_fixtures in the root of the package under test will be copied into the temporary workspace (for example /tmp/.tmp12345678). So, the setup function setup_my_benchmark and the benchmark of benchmarks-tests can access a fixture test_1.txt with a relative path like my_fixtures/test_1.txt

An example with follow_symlinks = true:

rust main!( setup = setup_my_benchmark; fixtures = "my_fixtures", follow_symlinks = true; run = cmd = "benchmark-tests", args = []; )

Note the fixtures argument will be ignored, if sandbox is set to false.

Examples

See the testbinbench benchmark file of this project for an example.

Features and differences to Iai

This crate is built on the same idea like the original Iai, but over the time applied a lot of improvements. The biggest difference is, that it uses Callgrind under the hood instead of Cachegrind.

More stable metrics

Iai-Callgrind has even more precise and stable metrics across different systems. It achieves this by

only counting events of function calls within the benchmarking function. This behavior virtually encapsulates the benchmark function and separates the benchmark from the surrounding code.
separating the iai library with the main macro from the actual runner. This is the reason for the extra installation step of iai-callgrind-runner but before this separation even small changes in the iai library had effects on the benchmarks under test.

Below a local run of one of the benchmarks of this library

shell $ cd iai-callgrind $ cargo bench --bench test_regular_bench test_regular_bench::bench_empty Instructions: 0 L1 Data Hits: 0 L2 Hits: 0 RAM Hits: 0 Total read+write: 0 Estimated Cycles: 0 test_regular_bench::bench_fibonacci Instructions: 1727 L1 Data Hits: 621 L2 Hits: 0 RAM Hits: 1 Total read+write: 2349 Estimated Cycles: 2383 test_regular_bench::bench_fibonacci_long Instructions: 26214727 L1 Data Hits: 9423880 L2 Hits: 0 RAM Hits: 2 Total read+write: 35638609 Estimated Cycles: 35638677

For comparison here the output of the same benchmark but in the github CI:

text test_regular_bench::bench_empty Instructions: 0 L1 Data Hits: 0 L2 Hits: 0 RAM Hits: 0 Total read+write: 0 Estimated Cycles: 0 test_regular_bench::bench_fibonacci Instructions: 1727 L1 Data Hits: 621 L2 Hits: 0 RAM Hits: 1 Total read+write: 2349 Estimated Cycles: 2383 test_regular_bench::bench_fibonacci_long Instructions: 26214727 L1 Data Hits: 9423880 L2 Hits: 0 RAM Hits: 2 Total read+write: 35638609 Estimated Cycles: 35638677

There's no difference (in this example) what makes benchmark runs and performance improvements of the benchmarked code even more comparable across systems. However, the above benchmarks are pretty clean and you'll most likely see some very small differences in your own benchmarks.

Cleaner output of Valgrind's annotation tools

The now obsolete calibration run needed with Iai has just fixed the summary output of Iai itself, but the output of cg_annotate was still cluttered by the setup functions and metrics. The callgrind_annotate output produced by Iai-Callgrind is far cleaner and centered on the actual function under test.

Rework the metrics output

The statistics of the benchmarks are mostly not compatible with the original Iai anymore although still related. They now also include some additional information:

text test_regular_bench::bench_fibonacci_long Instructions: 26214732 L1 Data Hits: 9423880 L2 Hits: 0 RAM Hits: 2 Total read+write: 35638609 Estimated Cycles: 35638677

There is an additional line Total read+write which summarizes all event counters above it and the L1 Accesses line changed to L1 Data Hits. So, the (L1) Instructions (reads) and L1 Data Hits are now separately listed.

In detail:

Total read+write = Instructions + L1 Data Hits + L2 Hits + RAM Hits.

The formula for the Estimated Cycles hasn't changed and uses Itamar Turner-Trauring's formula from https://pythonspeed.com/articles/consistent-benchmarking-in-ci/:

Estimated Cycles = (Instructions + L1 Data Hits) + 5 × (L2 Hits) + 35 × (RAM Hits)

For further details about how the caches are simulated and more, see the documentation of Callgrind

Colored output and logging

The metrics output is colored per default but follows the value for the CARGO_TERM_COLOR environment variable. Disabling colors can be achieved with setting this environment variable to CARGO_TERM_COLOR=never.

This library uses env_logger and the default logging level WARN. Currently, env_logger is only used to print some warnings and debug output, but to set the logging level to something different set the environment variable RUST_LOG for example to RUST_LOG=DEBUG. The logging output is colored per default but follows the setting of CARGO_TERM_COLOR. See also the documentation of env_logger.

Passing arguments to Callgrind

It's now possible to pass additional arguments to callgrind separated by -- (cargo bench -- CALLGRIND_ARGS) or overwrite the defaults, which are:

--I1=32768,8,64
--D1=32768,8,64
--LL=8388608,16,64
--cache-sim=yes (can't be changed)
--toggle-collect=*BENCHMARK_FILE::BENCHMARK_FUNCTION
--collect-atstart=no
--compress-pos=no
--compress-strings=no

Note that toggle-collect won't be overwritten by any additional toggle-collect argument but instead will be passed to Callgrind in addition to the default value. See the Skipping setup code section for an example of how to make use of this.

It's also possible to pass arguments to callgrind on a benchmark file level with the alternative form of the main macro

rust main!( callgrind_args = "--arg-with-flags=yes", "arg-without-flags=is_ok_too" functions = func1, func2 )

Incomplete list of other minor improvements

The output files of Callgrind are now located under a subdirectory under target/iai to avoid overwriting them in case of multiple benchmark files.

What hasn't changed

Iai-Callgrind cannot completely remove the influences of setup changes. However, these effects shouldn't be significant anymore.

Credits

Iai-Callgrind is forked from https://github.com/bheisler/iai and was originally written by Brook Heisler (@bheisler).

License

Iai-Callgrind is like Iai dual licensed under the Apache 2.0 license and the MIT license.

Iai-Callgrind

Table of Contents

Features

Installation

Benchmarking

Library Benchmarks

Quickstart

[inline(never)] // required for benchmarking functions

[inline(never)] // required for benchmarking functions

Examples

Skipping setup code

[exportname = "somespecialid::expensivesetup"]

[inline(never)]

[inline(never)]

Binary Benchmarks

Temporary Workspace and other important default behavior

Quickstart

[inline(never)] // required

[inline(never)] // required

Description

run (Mandatory)

cmd (Mandatory)

args (Mandatory)

opts (Optional)

envs (Optional)

sandbox (Optional)

options (Optional)

before, after, setup, teardown (Optional)

[inline(never)] // necessary

[inline(never)] // necessary

fixtures (Optional)

Examples

Features and differences to Iai

More stable metrics

Cleaner output of Valgrind's annotation tools

Rework the metrics output

Colored output and logging

Passing arguments to Callgrind

Incomplete list of other minor improvements

What hasn't changed

See also

Credits

License

`run` (Mandatory)

`cmd` (Mandatory)

`args` (Mandatory)

`opts` (Optional)

`envs` (Optional)

`sandbox` (Optional)

`options` (Optional)

`before`, `after`, `setup`, `teardown` (Optional)

`fixtures` (Optional)