`ShardedQueue`

ShardedQueue is needed for some schedulers and NonBlockingMutex as a highly specialized for their use case concurrent queue

ShardedQueue is a light-weight concurrent queue, which uses spin locking and fights lock contention with sharding

Notice that while it may seem that FIFO order is guaranteed, it is not, because there can be a situation, when multiple producers triggered long resize of very large shards, all but last, then passed enough time for resize to finish, then 1 producer triggers long resize of last shard, and all other threads start to consume or produce, and eventually start spinning on last shard, without guarantee which will acquire spin lock first, so we can't even guarantee that ShardedQueue::pop_front_or_spin will acquire lock before ShardedQueue::push_back on first attempt

Notice that this queue doesn't track length, since length's increment/decrement logic may change depending on use case, as well as logic when it goes from 1 to 0 or reverse(in some cases, like NonBlockingMutex, we don't even add action to queue when count reaches 1, but run it immediately in same thread), or even negative(to optimize some hot paths, like in some schedulers, since it is cheaper to restore count to correct state than to enforce that it can not go negative in some schedulers)

Examples

```rust use sharded_queue::ShardedQueue; use std::cell::UnsafeCell; use std::fmt::{Debug, Display, Formatter}; use std::marker::PhantomData; use std::ops::{Deref, DerefMut}; use std::sync::atomic::{AtomicUsize, Ordering};

pub struct NonBlockingMutex<'capturedvariables, State: ?Sized> { taskcount: AtomicUsize, taskqueue: ShardedQueue) + Send + 'capturedvariables>>, unsafe_state: UnsafeCell, }

impl<'capturedvariables, State> NonBlockingMutex<'capturedvariables, State> { #[inline] pub fn new(maxconcurrentthreadcount: usize, state: State) -> Self { Self { taskcount: Default::default(), taskqueue: ShardedQueue::new(maxconcurrentthreadcount), unsafestate: UnsafeCell::new(state), } } /// Please don't forget that order of execution is not guaranteed. Atomicity of operations is guaranteed, /// but order can be random #[inline] pub fn runiffirstorscheduleonfirst( &self, runwithstate: impl FnOnce(MutexGuard) + Send + 'capturedvariables, ) { if self.taskcount.fetchadd(1, Ordering::Acquire) != 0 { self.taskqueue.pushback(Box::new(runwithstate)); } else { // If we acquired first lock, run should be executed immediately and run loop started runwithstate(unsafe { MutexGuard::new(self) }); /// Note that if [fetch_sub] != 1 /// => some thread entered first if block in method /// => [ShardedQueue::pushback] is guaranteed to be called /// => [ShardedQueue::popfrontorspin] will not deadlock while spins until it gets item /// /// Notice that we run action first, and only then decrement count /// with releasing(pushing) memory changes, even if it looks otherwise while self.taskcount.fetchsub(1, Ordering::Release) != 1 { self.taskqueue.popfrontorspin()(unsafe { MutexGuard::new(self) }); } } } }

/// [Send], [Sync], and [MutexGuard] logic was taken from [std::sync::Mutex] /// and [std::sync::MutexGuard] /// /// these are the only places where T: Send matters; all other /// functionality works fine on a single thread. unsafe impl<'capturedvariables, State: Send> Send for NonBlockingMutex<'capturedvariables, State> { } unsafe impl<'capturedvariables, State: Send> Sync for NonBlockingMutex<'capturedvariables, State> { }

/// Code was mostly taken from [std::sync::MutexGuard], it is expected to protect [State] /// from moving out of synchronized loop pub struct MutexGuard< 'capturedvariables, 'nonblockingmutexref, State: ?Sized + 'nonblockingmutex_ref,

{ nonblockingmutex: &'nonblockingmutexref NonBlockingMutex<'capturedvariables, State>, /// Adding it to ensure that [MutexGuard] implements [Send] and [Sync] in same cases /// as [std::sync::MutexGuard] and protects [State] from going out of synchronized /// execution loop /// /// todo remove when this error is no longer actual /// negative trait bounds are not yet fully implemented; use marker types for now [E0658] phantomunsend: PhantomData>, } // todo uncomment when this error is no longer actual // negative trait bounds are not yet fully implemented; use marker types for now [E0658] // impl<'capturedvariables, 'nonblockingmutexref, State: ?Sized> !Send // for MutexGuard<'capturedvariables, 'nonblockingmutexref, State> // { // } unsafe impl<'capturedvariables, 'nonblockingmutexref, State: ?Sized + Sync> Sync for MutexGuard<'capturedvariables, 'nonblockingmutexref, State> { }

impl<'capturedvariables, 'nonblockingmutexref, State: ?Sized> MutexGuard<'capturedvariables, 'nonblockingmutexref, State> { unsafe fn new( nonblockingmutex: &'nonblockingmutexref NonBlockingMutex<'capturedvariables, State>, ) -> Self { Self { nonblockingmutex, phantomunsend: PhantomData, } } } impl<'capturedvariables, 'nonblockingmutexref, State: ?Sized> Deref for MutexGuard<'capturedvariables, 'nonblockingmutexref, State> { type Target = State; #[inline] fn deref(&self) -> &State{ unsafe { &*self.nonblockingmutex.unsafestate.get() } } } impl<'capturedvariables, 'nonblockingmutexref, State: ?Sized> DerefMut for MutexGuard<'capturedvariables, 'nonblockingmutexref, State> { #[inline] fn derefmut(&mut self) -> &mut State { unsafe { &mut *self.nonblockingmutex.unsafestate.get() } } } impl<'capturedvariables, 'nonblockingmutexref, State: ?Sized + Debug> Debug for MutexGuard<'capturedvariables, 'nonblockingmutexref, State> { #[inline] fn fmt(&self, f: &mut Formatter<'>) -> std::fmt::Result{ Debug::fmt(&self, f) } } impl<'capturedvariables, 'nonblockingmutexref, State: ?Sized + Display> Display for MutexGuard<'capturedvariables, 'nonblockingmutexref, State> { #[inline] fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result{ (self).fmt(f) } } ```

Benchmarks

| benchmarkname | operationcountperthread | concurrentthreadcount | averagetime | |-------------------------------------------|---------------------------:|------------------------:|-------------:| | shardedqueuepushandpopconcurrently | 1000 | 24 | 3.1980 ms | | crossbeamqueuepushandpopconcurrently | 1000 | 24 | 5.3154 ms | | queuemutexpushandpopconcurrently | 1000 | 24 | 6.4846 ms | | shardedqueuepushandpopconcurrently | 10000 | 24 | 37.245 ms | | crossbeamqueuepushandpopconcurrently | 10000 | 24 | 49.234 ms | | queuemutexpushandpopconcurrently | 10000 | 24 | 69.207 ms | | shardedqueuepushandpopconcurrently | 100000 | 24 | 395.12 ms | | crossbeamqueuepushandpopconcurrently | 100000 | 24 | 434.00 ms | | queuemutexpushandpopconcurrently | 100_000 | 24 | 476.59 ms |