Triple buffering in Rust

On crates.io On docs.rs Continuous Integration Requires rustc 1.46+

What is this?

This is an implementation of triple buffering written in Rust. You may find it useful for the following class of thread synchronization problems:

The simplest way to use it is as follows:

```rust // Create a triple buffer: let buf = TripleBuffer::new(&0);

// Split it into an input and output interface, to be respectively sent to // the producer thread and the consumer thread: let (mut bufinput, mut bufoutput) = buf.split();

// The producer can move a value into the buffer at any time buf_input.write(42);

// The consumer can access the latest value from the producer at any time let latestvalueref = bufoutput.read(); asserteq!(*latestvalueref, 42); ```

In situations where moving the original value away and being unable to modify it on the consumer's side is too costly, such as if creating a new value involves dynamic memory allocation, you can use a lower-level API which allows you to access the producer and consumer's buffers in place and to precisely control when updates are propagated:

```rust // Create and split a triple buffer use triplebuffer::TripleBuffer; let buf = TripleBuffer::new(&String::withcapacity(42)); let (mut bufinput, mut bufoutput) = buf.split();

// Mutate the input buffer in place { // Acquire a reference to the input buffer let input = bufinput.inputbuffer();

// In general, you don't know what's inside of the buffer, so you should
// always reset the value before use (this is a type-specific process).
input.clear();

// Perform an in-place update
input.push_str("Hello, ");

}

// Publish the above input buffer update buf_input.publish();

// Manually fetch the buffer update from the consumer interface buf_output.update();

// Acquire a mutable reference to the output buffer let output = bufoutput.outputbuffer();

// Post-process the output value before use output.push_str("world!"); ```

Give me details! How does it compare to alternatives?

Compared to a mutex:

Compared to the read-copy-update (RCU) primitive from the Linux kernel:

Compared to sending the updates on a message queue:

In short, triple buffering is what you're after in scenarios where a shared memory location is updated frequently by a single writer, read by a single reader who only wants the latest version, and you can spare some RAM.

How do I know your unsafe lock-free code is working?

By running the tests, of course! Which is unfortunately currently harder than I'd like it to be.

First of all, we have sequential tests, which are very thorough but obviously do not check the lock-free/synchronization part. You run them as follows:

$ cargo test

Then we have concurrent tests where, for example, a reader thread continuously observes the values from a rate-limited writer thread, and makes sure that he can see every single update without any incorrect value slipping in the middle.

These tests are more important, but also harder to run because one must first check some assumptions:

Taking this and the relatively long run time (~10-20 s) into account, the concurrent tests are ignored by default. To run them, make sure nothing is eating CPU in the background and do:

$ cargo test --release -- --ignored --nocapture --test-threads=1

Finally, we have benchmarks, which allow you to test how well the code is performing on your machine. We are now using criterion for said benchmarks, which seems that to run them, you can simply do:

$ cargo bench

These benchmarks exercise the worst-case scenario of u8 payloads, where synchronization overhead dominates as the cost of reading and writing the actual data is only 1 cycle. In real-world use cases, you will spend more time updating buffers and less time synchronizing them.

However, due to the artificial nature of microbenchmarking, the benchmarks must exercise two scenarios which are respectively overly optimistic and overly pessimistic:

  1. In uncontended mode, the buffer input and output reside on the same CPU core, which underestimates the overhead of transferring modified cache lines from the L1 cache of the source CPU to that of the destination CPU.
  2. In contended mode, the benchmarked half of the triple buffer is operating under maximal load from the other half, which is much more busy than what is actually going to be observed in real-world workloads.

Therefore, consider these benchmarks' timings as orders of magnitude of the best and the worst that you can expect from triple-buffer, where actual performance will be somewhere inbetween these two numbers depending on your workload.

On an Intel Core i3-3220 CPU @ 3.30GHz, typical results are as follows:

License

This crate is distributed under the terms of the MPLv2 license. See the LICENSE file for details.

More relaxed licensing (Apache, MIT, BSD...) may also be negociated, in exchange of a financial contribution. Contact me for details at knightsofni AT gmx DOTCOM.