Fast zero-configuration single-binary simple queue service.
Currently, queued supports Linux only.
queued requires persistent storage, and it's preferred to provide a block device directly (e.g. /dev/my_block_device
), to bypass the file system. Alternatively, a standard file can be used too (e.g. /var/lib/queued/data
). In either case, the entire device/file will be used, and it must have a size that's a multiple of 1024 bytes.
```
cargo install queued
queued --device /dev/myblockdevice --format ```
queued --device /dev/my_block_device
``` 🌐 POST localhost:3333/push { "contents": "Hello, world!" } ✅ 200 OK { "index": 190234 }
🌐 POST localhost:3333/poll { "visibilitytimeoutsecs": 30 } ✅ 200 OK { "message": { "contents": "Hello, world!", "created": "2023-01-03T12:00:00Z", "index": 190234, "pollcount": 1, "polltag": "f914659685fcea9d60" } }
🌐 POST localhost:3333/delete { "index": 190234, "poll_tag": "f914659685fcea9d60" } ✅ 200 OK {} ```
On a machine with an Intel Core i5-12400 CPU, Samsung 970 EVO Plus 1TB NVMe SSD, and Linux 5.17 kernel, queued manages around 70,000 operations (push, poll, or delete) per second with 4,096 concurrent clients.
At the API layer, only a successful response (i.e. 2xx
) means that the request has been successfully persisted to disk. Assume any interrupted or failed requests did not safely get stored, and retry as appropriate. Changes are strongly consistent and immediately visible to all other callers.
Internally, queued records a hash of persisted data (including metadata and data of messages), to verify integrity when starting the server. It's recommended to use error-detecting-and-correcting durable storage when running in production, like any other stateful workload.
Performing backups can be done by stopping the process and taking a copy of the contents of the file/device. Using compression can reduce bandwidth (when transferring) and storage usage.
GET /healthz
returns the current build version.
GET /metrics
returns metrics in the Prometheus format:
```
queued_available 0 1672977507603
queuedemptypoll 8192 1672977507603
queuediosync 147600 1672977507603
queuediosyncbackgroundloops 722814 1672977507603
queuediosync_delayed 2868573 1672977507603
queuediosynclockhold_us 61899209 1672977507603
queuediosynclockholds 3722814 1672977507603
queuediosynclongestdelay_us 31622905 1672977507603
queuediosyncshortestdelay_us 22888216 1672977507603
queuediosynctriggeredby_bytes 0 1672977507603
queuediosynctriggeredby_time 147600 1672977507603
queuediosync_us 8005024654 1672977507603
queuediowrite_bytes 263888890 1672977507603
queuediowrite 3000000 1672977507603
queuediowrite_us 54866298628 1672977507603
queuedmissingdelete 0 1672977507603
queuedsuccessfuldelete 1000000 1672977507603
queuedsuccessfulpoll 1000000 1672977507603
queuedsuccessfulpush 1000000 1672977507603
queuedsuspendeddelete 0 1672977507603
queuedsuspendedpoll 0 1672977507603
queuedsuspendedpush 0 1672977507603
queued_vacant 1000000 1672977507603 ```
POST /suspend
can suspend specific API endpoints, useful for temporary debugging or emergency intervention without stopping the server. It takes a request body like:
json
{
"delete": true,
"poll": true,
"push": false
}
Set a property to true
to disable that endpoint, and false
to re-enable it. Disabled endpoints will return 503 Service Unavailable
. Use GET /suspend
to get the currently suspended endpoints.
queued is a standard Rust project, and does not require any special build tools or system libraries.
There are calls to pread
and pwrite
, so it won't build for targets without those.
As the design and functionality is quite simple, I/O tends to become the bottleneck at scale (and at smaller throughputs, the performance is more than enough). This is important to know when profiling and optimising. For example, with CPU flamegraphs, it may appear that the write
syscall is the dominant cost (e.g. kernel and file system locks), but if queued is compiled with the unsafe_fsync_none
feature, performance can increase dramatically, indicating that the CPU flamegraphs were missing I/O from the picture; off-CPU flamegraphs may be more useful. This can be expected, as the nature of queue service workloads is very high levels (queues are expected to have high throughput, and also every operation like push, poll, and delete is a write) of small writes (message contents are usually small) to non-contiguous areas (messages get deleted, updated, and retried at varying durations, so the storage layout tends towards high fragmentation without algorithmic rebalancing or frequent defragmentation).
Clients in example-client can help with running synthetic workloads for stress testing, performance tuning, and profiling.
As I/O becomes the main attention for optimisation, keep in mind:
- write
syscall data is immediately visible to all read
syscalls in all threads and processes.
- write
syscalls can be reordered, unless fdatasync
/fsync
is used, which acts as both a barrier and cache-flusher. This means that a fast sequence of write
(1 create) -> read
(2 inspect) -> write
(3 update) can actually cause 1 to clobber 3. Ideally there would be two different APIs for creating a barrier and flushing the cache.