A multithreaded and single threaded string interner that allows strings to be cached with a minimal memory footprint,
associating them with a unique [key] that can be used to retrieve them at any time. A [Rodeo
] allows O(1)
internment and resolution and can be turned into a [RodeoReader
] to allow for contention-free resolutions
with both key to str and str to key operations. It can also be turned into a [RodeoResolver
] with only
key to str operations for the lowest possible memory usage.
For single-threaded workloads [Rodeo
] is encouraged, while multi-threaded applications should use [ThreadedRodeo
].
Both of these are the only way to intern strings, but most applications will hit a stage where they are done interning
strings, and at that point is where the choice between [RodeoReader
] and [RodeoResolver
]. If the user needs to get
keys for strings still, then they must use the [RodeoReader
] (although they can still transfer into a [RodeoResolver
])
at this point. For users who just need key to string resolution, the [RodeoResolver
] gives contention-free access at the
minimum possible memory usage. Note that to gain access to [ThreadedRodeo
] the multi-threaded
feature is required.
| Interner | Thread-safe | Intern String | str to key | key to str | Contention Free | Memory Usage |
| ----------------- | :---------: | :-----------: | :--------: | :--------: | :-------------: | :----------: |
| [Rodeo
] | ❌ | ✅ | ✅ | ✅ | N/A | Medium |
| [ThreadedRodeo
] | ✅ | ✅ | ✅ | ✅ | ❌ | Most |
| [RodeoReader
] | ✅ | ❌ | ✅ | ✅ | ✅ | Medium |
| [RodeoResolver
] | ✅ | ❌ | ❌ | ✅ | ✅ | Least |
By default lasso
has one dependency, hashbrown
, and only [Rodeo
] is exposed. Hashbrown is used since the [raw_entry
api] is currently unstable in the standard library's hashmap.
The raw hashmap API is used for custom hashing within the hashmaps, which works to dramatically reduce memory usage
To make use of [ThreadedRodeo
], you must enable the multi-threaded
feature.
multi-threaded
- Enables [ThreadedRodeo
], the interner for multi-threaded tasksahasher
- Use [ahash
]'s RandomState
as the default hasherno-std
- Enables no_std
+ alloc
support for [Rodeo
] and [ThreadedRodeo
]
ahasher
- no_std
hashing functionserialize
- Implements Serialize
and Deserialize
for all Spur
types and all internersinline-more
- Annotate external apis with #[inline]
```rust use lasso::Rodeo;
let mut rodeo = Rodeo::default(); let key = rodeo.getorintern("Hello, world!");
// Easily retrieve the value of a key and find the key for values asserteq!("Hello, world!", rodeo.resolve(&key)); asserteq!(Some(key), rodeo.get("Hello, world!"));
// Interning the same string again will yield the same key let key2 = rodeo.getorintern("Hello, world!");
assert_eq!(key, key2); ```
```rust use lasso::ThreadedRodeo; use std::{thread, sync::Arc};
let rodeo = Arc::new(ThreadedRodeo::default()); let key = rodeo.getorintern("Hello, world!");
// Easily retrieve the value of a key and find the key for values asserteq!("Hello, world!", rodeo.resolve(&key)); asserteq!(Some(key), rodeo.get("Hello, world!"));
// Interning the same string again will yield the same key let key2 = rodeo.getorintern("Hello, world!");
assert_eq!(key, key2);
// ThreadedRodeo can be shared across threads let moved = Arc::clone(&rodeo); let hello = thread::spawn(move || { asserteq!("Hello, world!", moved.resolve(&key)); moved.getor_intern("Hello from the thread!") }) .join() .unwrap();
asserteq!("Hello, world!", rodeo.resolve(&key)); asserteq!("Hello from the thread!", rodeo.resolve(&hello)); ```
```rust use lasso::Rodeo;
// Rodeo and ThreadedRodeo are interchangeable here let mut rodeo = Rodeo::default();
let key = rodeo.getorintern("Hello, world!"); assert_eq!("Hello, world!", rodeo.resolve(&key));
let reader = rodeo.into_reader();
// Reader keeps all the strings from the parent asserteq!("Hello, world!", reader.resolve(&key)); asserteq!(Some(key), reader.get("Hello, world!"));
// The Reader can now be shared across threads, no matter what kind of Rodeo created it ```
```rust use lasso::Rodeo;
// Rodeo and ThreadedRodeo are interchangeable here let mut rodeo = Rodeo::default();
let key = rodeo.getorintern("Hello, world!"); assert_eq!("Hello, world!", rodeo.resolve(&key));
let resolver = rodeo.into_resolver();
// Resolver keeps all the strings from the parent assert_eq!("Hello, world!", resolver.resolve(&key));
// The Resolver can now be shared across threads, no matter what kind of Rodeo created it ```
Sometimes you want your keys to only inhabit (or not inhabit) a certain range of values so that you can have custom [niches]. This allows you to pack more data into what would otherwise be unused space, which can be critical for memory-sensitive applications.
```rust use lasso::{Key, Rodeo};
// First make our key type, this will be what we use as handles into our interner
struct NicheKey(u32);
// This will reserve the upper 255 values for us to use as niches const NICHE: usize = 0xFF000000;
// Implementing Key
is unsafe and requires that anything given to try_from_usize
must produce the
// same usize
when into_usize
is later called
unsafe impl Key for NicheKey {
fn into_usize(self) -> usize {
self.0 as usize
}
fn try_from_usize(int: usize) -> Option<Self> {
if int < NICHE {
// The value isn't in our niche range, so we're good to go
Some(Self(int as u32))
} else {
// The value interferes with our niche, so we return `None`
None
}
}
}
// To make sure we're upholding Key
's safety contract, let's make two small tests
fn valueinrange() { let key = NicheKey::tryfromusize(0).unwrap(); asserteq!(key.intousize(), 0);
let key = NicheKey::try_from_usize(NICHE - 1).unwrap();
assert_eq!(key.into_usize(), NICHE - 1);
}
fn valueoutofrange() { let key = NicheKey::tryfromusize(NICHE); assert!(key.isnone());
let key = NicheKey::try_from_usize(u32::max_value() as usize);
assert!(key.is_none());
}
// And now we're done and can make Rodeo
s or ThreadedRodeo
s that use our custom key!
let mut rodeo: Rodeo
FromIterator
```rust use lasso::Rodeo; use core::iter::FromIterator;
// Works for both Rodeo
and ThreadedRodeo
let rodeo = Rodeo::from_iter(vec![
"one string",
"two string",
"red string",
"blue string",
]);
assert!(rodeo.contains("one string")); assert!(rodeo.contains("two string")); assert!(rodeo.contains("red string")); assert!(rodeo.contains("blue string")); ```
```rust use lasso::Rodeo; use core::iter::FromIterator;
// Works for both Rodeo
and ThreadedRodeo
let rodeo: Rodeo = vec!["one string", "two string", "red string", "blue string"]
.into_iter()
.collect();
assert!(rodeo.contains("one string")); assert!(rodeo.contains("two string")); assert!(rodeo.contains("red string")); assert!(rodeo.contains("blue string")); ```
Benchmarks were gathered with Criterion.rs
OS: Windows 10
CPU: Ryzen 9 3900X at 3800Mhz
RAM: 3200Mhz
Rustc: Stable 1.44.1
| Method | Time | Throughput |
| :--------------------------- | :-------: | :----------: |
| resolve
| 1.9251 μs | 13.285 GiB/s |
| try_resolve
| 1.9214 μs | 13.311 GiB/s |
| resolve_unchecked
| 1.4356 μs | 17.816 GiB/s |
| get_or_intern
(empty) | 60.350 μs | 433.96 MiB/s |
| get_or_intern
(filled) | 57.415 μs | 456.15 MiB/s |
| try_get_or_intern
(empty) | 58.978 μs | 444.06 MiB/s |
| try_get_or_intern (filled)
| 57.421 μs | 456.10 MiB/s |
| get
(empty) | 37.288 μs | 702.37 MiB/s |
| get
(filled) | 55.095 μs | 475.36 MiB/s |
| Method | Time | Throughput |
| :--------------------------- | :-------: | :----------: |
| try_resolve
| 1.9282 μs | 13.264 GiB/s |
| resolve
| 1.9404 μs | 13.181 GiB/s |
| resolve_unchecked
| 1.4328 μs | 17.851 GiB/s |
| get_or_intern
(empty) | 38.029 μs | 688.68 MiB/s |
| get_or_intern
(filled) | 33.650 μs | 778.30 MiB/s |
| try_get_or_intern
(empty) | 39.392 μs | 664.84 MiB/s |
| try_get_or_intern (filled)
| 33.435 μs | 783.31 MiB/s |
| get
(empty) | 12.565 μs | 2.0356 GiB/s |
| get
(filled) | 26.545 μs | 986.61 MiB/s |
| Method | Time | Throughput |
| :--------------------------- | :-------: | :----------: |
| resolve
| 1.9014 μs | 13.451 GiB/s |
| try_resolve
| 1.9278 μs | 13.267 GiB/s |
| resolve_unchecked
| 1.4449 μs | 17.701 GiB/s |
| get_or_intern
(empty) | 32.523 μs | 805.27 MiB/s |
| get_or_intern
(filled) | 30.281 μs | 864.88 MiB/s |
| try_get_or_intern
(empty) | 31.630 μs | 828.00 MiB/s |
| try_get_or_intern (filled)
| 31.002 μs | 844.78 MiB/s |
| get
(empty) | 12.699 μs | 2.0141 GiB/s |
| get
(filled) | 29.220 μs | 896.28 MiB/s |
| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
| :--------------------------- | :-------------: | :-------------------: | :---------------: | :---------------------: |
| resolve
| 54.336 μs | 482.00 MiB/s | 364.27 μs | 71.897 MiB/s |
| try_resolve
| 54.582 μs | 479.82 MiB/s | 352.67 μs | 74.261 MiB/s |
| get_or_intern
(empty) | 266.03 μs | 98.447 MiB/s | N\A | N\A |
| get_or_intern
(filled) | 103.04 μs | 254.17 MiB/s | 441.42 μs | 59.331 MiB/s |
| try_get_or_intern
(empty) | 261.80 μs | 100.04 MiB/s | N\A | N\A |
| try_get_or_intern
(filled) | 102.61 μs | 255.25 MiB/s | 447.42 μs | 58.535 MiB/s |
| get
(empty) | 80.346 μs | 325.96 MiB/s | N\A | N\A |
| get
(filled) | 92.669 μs | 282.62 MiB/s | 439.24 μs | 59.626 MiB/s |
| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
| :--------------------------- | :-------------: | :-------------------: | :---------------: | :---------------------: |
| resolve
| 22.261 μs | 1.1489 GiB/s | 265.46 μs | 98.658 MiB/s |
| try_resolve
| 22.378 μs | 1.1429 GiB/s | 268.58 μs | 97.513 MiB/s |
| get_or_intern
(empty) | 157.86 μs | 165.91 MiB/s | N\A | N\A |
| get_or_intern
(filled) | 56.320 μs | 465.02 MiB/s | 357.13 μs | 73.335 MiB/s |
| try_get_or_intern
(empty) | 161.46 μs | 162.21 MiB/s | N\A | N\A |
| try_get_or_intern
(filled) | 55.874 μs | 468.73 MiB/s | 360.25 μs | 72.698 MiB/s |
| get
(empty) | 43.520 μs | 601.79 MiB/s | N\A | N\A |
| get
(filled) | 53.720 μs | 487.52 MiB/s | 360.66 μs | 72.616 MiB/s |
| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
| :--------------------------- | :-------------: | :-------------------: | :---------------: | :---------------------: |
| try_resolve
| 17.289 μs | 1.4794 GiB/s | 238.29 μs | 109.91 MiB/s |
| resolve
| 19.833 μs | 1.2896 GiB/s | 237.05 μs | 110.48 MiB/s |
| get_or_intern
(empty) | 130.97 μs | 199.97 MiB/s | N\A | N\A |
| get_or_intern
(filled) | 42.630 μs | 614.35 MiB/s | 301.60 μs | 86.837 MiB/s |
| try_get_or_intern
(empty) | 129.30 μs | 202.55 MiB/s | N\A | N\A |
| try_get_or_intern
(filled) | 42.508 μs | 616.12 MiB/s | 337.29 μs | 77.648 MiB/s |
| get
(empty) | 28.001 μs | 935.30 MiB/s | N\A | N\A |
| get
(filled) | 37.700 μs | 694.68 MiB/s | 292.15 μs | 89.645 MiB/s |
| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
| :------------------ | :-------------: | :-------------------: | :---------------: | :----------------------: |
| resolve
| 1.9398 μs | 13.185 GiB/s | 4.3153 μs | 5.9269 GiB/s |
| try_resolve
| 1.9315 μs | 13.242 GiB/s | 4.1956 μs | 6.0959 GiB/s |
| resolve_unchecked
| 1.4416 μs | 17.741 GiB/s | 3.1204 μs | 8.1964 GiB/s |
| get
(empty) | 38.886 μs | 673.50 MiB/s | N\A | N\A |
| get
(filled) | 56.271 μs | 465.42 MiB/s | 105.12 μs | 249.14 MiB/s |
| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
| :------------------ | :-------------: | :-------------------: | :---------------: | :---------------------: |
| resolve
| 1.9404 μs | 13.181 GiB/s | 4.1881 μs | 6.1069 GiB/s |
| try_resolve
| 1.8932 μs | 13.509 GiB/s | 4.2410 μs | 6.0306 GiB/s |
| resolve_unchecked
| 1.4128 μs | 18.103 GiB/s | 3.1691 μs | 8.0703 GiB/s |
| get
(empty) | 11.952 μs | 2.1399 GiB/s | N\A | N\A |
| get
(filled) | 27.093 μs | 966.65 MiB/s | 56.269 μs | 465.44 MiB/s |
| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
| :------------------ | :-------------: | :-------------------: | :---------------: | :---------------------: |
| resolve
| 1.8987 μs | 13.471 GiB/s | 4.2117 μs | 6.0727 GiB/s |
| try_resolve
| 1.9103 μs | 13.389 GiB/s | 4.2254 μs | 6.0529 GiB/s |
| resolve_unchecked
| 1.4469 μs | 17.677 GiB/s | 3.0923 μs | 8.2709 GiB/s |
| get
(empty) | 12.994 μs | 1.9682 GiB/s | N\A | N\A |
| get
(filled) | 29.745 μs | 880.49 MiB/s | 52.387 μs | 499.93 MiB/s |
| Method | Time (1 Thread) | Throughput (1 Thread) | Time (24 Threads) | Throughput (24 Threads) |
| :------------------ | :-------------: | :-------------------: | :---------------: | :---------------------: |
| resolve
| 1.9416 μs | 13.172 GiB/s | 3.9114 μs | 6.5387 GiB/s |
| try_resolve
| 1.9264 μs | 13.277 GiB/s | 3.9289 μs | 6.5097 GiB/s |
| resolve_unchecked
| 1.6638 μs | 15.372 GiB/s | 3.1741 μs | 8.0578 GiB/s |