NonBlockingMutex
NonBlockingMutex
NonBlockingMutex
is currently the fastest way to do
expensive calculations under lock, or do cheap operations
under lock when concurrency/load/contention is very high -
see benchmarks in directory benches
and run them with
bash
cargo bench
bash
cargo add non_blocking_mutex
```rust use nonblockingmutex::NonBlockingMutex; use std::thread::{available_parallelism};
let maxconcurrentthreadcount = availableparallelism().unwrap().get();
let nonblockingmutex = NonBlockingMutex::new(maxconcurrentthreadcount, 0); nonblockingmutex.runiffirstorscheduleon_first(|mut state| { *state += 1; }); ```
NonBlockingMutex
NonBlockingMutex
forces first thread to enter synchronized block to
do all tasks(including added while it is running,
potentially running forever if tasks are being added forever)
It is more difficult to continue execution on same thread after
synchronized logic is run, you need to schedule continuation on some
scheduler when you want to continue after end of synchronized logic
in new thread or introduce other synchronization primitives,
like channels, or WaitGroup
-s, or similar
NonBlockingMutex
performs worse than std::sync::Mutex
when
concurrency/load/contention is low
Similar to std::sync::Mutex
, NonBlockingMutex
doesn't guarantee
order of execution, only atomicity of operations is guaranteed
See benchmark logic in directory benches
and reproduce results by running
bash
cargo bench
NonBlockingMutex
performs only a little bit slower than Mutex
when there is only 1 thread and 1 operation
(because NonBlockingMutex
doesn't Box
and store in ShardedQueue
first operation in loop)
| benchmarkname | time | |:----------------------------------------|----------:| | incrementoncewithoutmutex | 0.228 ns | | incrementonceundernonblockingmutex | 9.445 ns | | incrementonceundermutexblockingly | 8.851 ns | | incrementonceundermutex_spinny | 10.603 ns |
With higher contention(caused by long time under lock in our case,
but can also be caused by higher CPU count), NonBlockingMutex
starts to perform better than std::sync::Mutex
| Benchmark name | Operation count per thread | Spin under lock count | Concurrent thread count | averagetime | |:------------------------------------------------|---------------------------:|----------------------:|------------------------:|-------------:| | incrementundernonblockingmutexconcurrently | 1000 | 0 | 24 | 3.408 ms | | incrementundermutexblockinglyconcurrently | 1000 | 0 | 24 | 1.072 ms | | incrementundermutexspinnyconcurrently | 1000 | 0 | 24 | 4.376 ms | | incrementundernonblockingmutexconcurrently | 10000 | 0 | 24 | 42.584 ms | | incrementundermutexblockinglyconcurrently | 10000 | 0 | 24 | 14.960 ms | | incrementundermutexspinnyconcurrently | 10000 | 0 | 24 | 94.658 ms | | incrementundernonblockingmutexconcurrently | 1000 | 10 | 24 | 12.280 ms | | incrementundermutexblockinglyconcurrently | 1000 | 10 | 24 | 8.345 ms | | incrementundermutexspinnyconcurrently | 1000 | 10 | 24 | 34.977 ms | | incrementundernonblockingmutexconcurrently | 10000 | 10 | 24 | 70.013 ms | | incrementundermutexblockinglyconcurrently | 10000 | 10 | 24 | 84.143 ms | | incrementundermutexspinnyconcurrently | 10000 | 10 | 24 | 349.07 ms | | incrementundernonblockingmutexconcurrently | 1000 | 100 | 24 | 44.670 ms | | incrementundermutexblockinglyconcurrently | 1000 | 100 | 24 | 47.335 ms | | incrementundermutexspinnyconcurrently | 1000 | 100 | 24 | 117.570 ms | | incrementundernonblockingmutexconcurrently | 10000 | 100 | 24 | 378.230 ms | | incrementundermutexblockinglyconcurrently | 10000 | 100 | 24 | 801.090 ms | | incrementundermutexspinnyconcurrently | 10_000 | 100 | 24 | 1200.400 ms |
First thread, which calls NonBlockingMutex::run_if_first_or_schedule_on_first
,
atomically increments task_count
, and,
if thread was first to increment task_count
from 0 to 1,
first thread immediately executes first task,
and then atomically decrements task_count
and checks if task_count
changed from 1 to 0. If task_count
changed from 1 to 0 -
there are no more tasks and first thread can finish execution loop,
otherwise first thread gets next task from task_queue
and runs task,
then decrements tasks count after it was run and repeats check if
task_count
changed from 1 to 0 until there are no more tasks left.
Not first threads also atomically increment task_count
,
do check if they are first, Box
task and push task Box
to task_queue
This design allows us to avoid lock contention, but adds ~constant time
of Box
-ing task and putting task Box
into concurrent task_queue
, and
incrementing and decrementing task_count
, so when lock contention is low,
NonBlockingMutex
performs worse than std::sync::Mutex
,
but when contention is high
(because we have more CPU-s or because we want to do expensive
operations under lock), NonBlockingMutex
performs better
than std::sync::Mutex