This repo contains a suite of tools called cbdr
. You can install it like so:
cargo install cbdr
cbdr sample
takes a list of benchmarks in the form name:program
.
It randomly selects a benchmark, runs the program, reports the execution time,
and loops. The output is CSV-formatted and goes on forever. To get an idea
of what it looks like, let's pass it though head
(to limit it) and column
(to format it):
``` $ cbdr sample "md5:md5sum $BIGFILE" "sha1:sha1sum $BIGFILE" "sha256:sha256sum $BIG_FILE" | head | column -ts, Warming up md5... Warming up sha1... Warming up sha256...
benchmark systime usertime wall_time md5 0.01 0.12 0.131099686 sha1 0.03 0.13 0.155063893 sha256 0.02 0.32 0.344348186 sha256 0.01 0.32 0.335235973 md5 0 0.13 0.128056813 md5 0.01 0.13 0.130115718 md5 0.02 0.1 0.131369468 sha1 0 0.15 0.149611563 md5 0.01 0.12 0.128339435 ```
cbdr analyze
takes the output of cbdr sample
and summarizes the differences
between the benchmarks.
```
$ cbdr analyze
sha1 sha256 difference (99.9% CI)
systime 0.011 ± 0.008 0.012 ± 0.008 [ -12.0% .. +20.0%] usertime 0.138 ± 0.009 0.334 ± 0.013 [+139.6% .. +143.4%] wall_time 0.149 ± 0.006 0.345 ± 0.011 [+129.8% .. +132.5%] samples 410 422 ```
Let's take a look at this line:
md5 sha1 difference (99.9% CI)
wall_time 0.128 ± 0.006 0.149 ± 0.006 [ +15.9% .. +18.0%]
So, we have a set of wall-time measurements for "md5" (let's call the set
X
) and a set of wall-time measurements for "sha1" (let's call it Y
).
First we get the summary statistics for those sets:
mean(X) = 0.128 secs
and std_dev(X) = 0.006 secs
mean(Y) = 0.149 secs
and std_dev(Y) = 0.006 secs
Then we get an estimate of the difference in the means
confidence_interval[mean(Y) - mean(X)] = [+0.02035 secs, +0.02304 secs]
But it's expressed as a percentage of the first mean:
So you can understand the "[+15.9% .. +18.0%]" as meaning this: if you know how long md5 takes to run on average, then to get sha1's average running time you need to add on an extra x%, where x is somewhere between 15.9% and 18%.
Let's look at the table comparing md5 to sha1. Judging by wall-clock time, sha1 looks like it's about 17% slower than md5, doesn't it? So we go ahead and tell our colleages:
I benchmarked sha1 against md5 and found that it was 17% slower.
Well, that's fine, but it's much better if we report the confidence interval instead!
By wall-clock time, sha1 was 16-18% slower than md5 (α=99.9%).
Now our colleages have some idea of how good the benchmark was. Note how
we also reported the alpha-value. cbdr analyze
lets you customize this
parameter with the --significance
option, but it defaults to 99.9%.
Now let's compare how much time md5sum and sha1sum spend inside the kernel. The difference is somewhere between -15% and +18%. Not only is this interval quite wide (implying a lot of variance), but it also straddles 0%. This means there isn't enough evidence to assert a real difference.
The difference in system time was within noise (α=99.9%)
Remember kids: practise statistical responsibility when communicating benchmark results!
You can pipe the output of cbdr sample
directly into cbdr analyze
to
see the confidence intervals change live as they're updated by new data.
$ cbdr sample --timeout=30s [benchmarks] | cbdr analyze
(Note: It's tempting to start a benchmark, watch the results come in, and then when you think you've collected enough you hit ctrl-C. If you do this the risk of biasing your results is very high. Decide your stopping point before you start the benchmark!)
If the command lines you're passing in are long, you might find it more convenient to write them one-per-line and pass them into xargs:
$ xargs cbdr sample >results.csv <<EOF
"md5: md5sum $BIG_FILE"
"sha1: sha1sum $BIG_FILE"
"sha256: sha256sum $BIG_FILE"
EOF
cbdr plot
produces a vega-lite specification for vizualising benchmark
results.
$ cbdr plot <results.csv | vl2png | feh -
This can be useful for getting an idea of how gaussian your results are.
cbdr
can make use of custom measurement scripts. For instance, there's one
in bench_helpers/ which calls out to "perf stat" to get some fancier metrics:
./bench_helpers/perf-bench.sh 'git ls-files'
{
"task_clock": 4.90,
"cpu_utilization": 0.600,
"context_switches": 1,
"cpu_migrations": 0,
"page_faults": 164,
"cycles": 3875762,
"instructions": 3859594,
"branches": 881081,
"branch_misses": 19461
}
You can tell cbdr
to use such a script with the -b
flag:
``` % cbdr sample -b bench_helpers/perf-bench.sh find 'git ls-files' | cbdr analyze Warming up find... Warming up git ls-files...
find git ls-files difference (99.9% CI)
branchmisses 126177.219 ± 4491.457 18969.948 ± 309.242 [ -85.3% .. -84.6%] branches 7124422.062 ± 10084.975 880096.793 ± 4663.410 [ -87.7% .. -87.6%] contextswitches 1.096 ± 1.557 0.117 ± 0.356 [-103.3% .. -75.4%] cpumigrations 0.003 ± 0.058 0.000 ± 0.000 [ NaN% .. NaN%] cpuutilization 0.947 ± 0.010 0.654 ± 0.022 [ -31.2% .. -30.7%] cycles 44041299.653 ± 979069.887 4108837.580 ± 190263.302 [ -90.9% .. -90.5%] instructions 36138254.480 ± 59172.510 3854234.597 ± 24108.835 [ -89.4% .. -89.3%] pagefaults 146.617 ± 2.193 164.714 ± 1.883 [ +12.2% .. +12.5%] taskclock 14.901 ± 3.714 1.434 ± 0.560 [ -92.8% .. -88.0%] samples 1189 1268 ```