A library for working with binaries and strings. The library tries to avoid heap-allocations / memory-copy whenever possible by automatically choosing a reasonable strategy: stack for small binaries; static-lifetime-binary or reference-counting. It's easy to use (no lifetimes; the binary type is sized), Send + Sync
is optional (thus no synchronization overhead), provides optional serde support and has a similar API for strings and binaries. Custom binary/string types can be implemented for fine-tuning.
Libraries that provide similar functionality:
Licensed under either of
at your option.
toml
[dependencies]
abin = "*"
```rust use std::iter::FromIterator; use std::ops::Deref;
use abin::{AnyBin, AnyStr, Bin, BinFactory, NewBin, NewStr, Str, StrFactory};
fn usagebasics() { // static binary / static string let staticbin: Bin = NewBin::fromstatic("I'm a static binary, hello!".asbytes()); let staticstr: Str = NewStr::fromstatic("I'm a static binary, hello!"); asserteq!(&staticbin, staticstr.asbin()); asserteq!(staticstr.asstr(), "I'm a static binary, hello!"); // non-static (but small enough to be stored on the stack) let hellobin: Bin = NewBin::fromiter([72u8, 101u8, 108u8, 108u8, 111u8].iter().copied()); let hellostr: Str = NewStr::copyfromstr("Hello"); asserteq!(&hellobin, hellostr.asbin()); asserteq!(hellostr.as_ref() as &str, "Hello");
// operations for binaries / strings
// length (number of bytes / number of utf-8 bytes)
assert_eq!(5, hello_bin.len());
assert_eq!(5, hello_str.len());
// is_empty
assert_eq!(false, hello_bin.is_empty());
assert_eq!(false, hello_str.is_empty());
// as_slice / as_str / deref / as_bin
assert_eq!(&[72u8, 101u8, 108u8, 108u8, 111u8], hello_bin.as_slice());
assert_eq!("Hello", hello_str.as_str());
assert_eq!("Hello", hello_str.deref());
assert_eq!(&hello_bin, hello_str.as_bin());
// slice
assert_eq!(
NewBin::from_static(&[72u8, 101u8]),
hello_bin.slice(0..2).unwrap()
);
assert_eq!(NewStr::from_static("He"), hello_str.slice(0..2).unwrap());
// clone
assert_eq!(hello_bin.clone(), hello_bin);
assert_eq!(hello_str.clone(), hello_str);
// compare
assert!(NewBin::from_static(&[255u8]) > hello_bin);
assert!(NewStr::from_static("Z") > hello_str);
// convert string into binary and binary into string
let hello_bin_from_str: Bin = hello_str.clone().into_bin();
assert_eq!(hello_bin_from_str, hello_bin);
let hello_str_from_bin: Str = AnyStr::from_utf8(hello_bin.clone()).expect("invalid utf8!");
assert_eq!(hello_str_from_bin, hello_str);
// convert into Vec<u8> / String
assert_eq!(
Vec::from_iter([72u8, 101u8, 108u8, 108u8, 111u8].iter().copied()),
hello_bin.into_vec()
);
assert_eq!("Hello".to_owned(), hello_str.into_string());
} ```
Interfaces:
* Bin
: Binary (it's a struct).
* SBin
: Synchronized binary (it's a struct).
* Str
: String (type Str = AnyStr<Bin>
)
* SStr
: Synchronized string (type SStr = AnyStr<SBin>
).
Factories provided by the default implementation:
* NewBin
: Creates Bin
.
* NewSBin
: Creates SBin
.
* NewStr
: Creates Str
.
* NewSStr
: Creates SStr
.
See also:
* AnyBin
: Trait implemented by Bin
and SBin
.
* AnyStr
: See Str
and SStr
; string backed by either Bin
or SBin
.
* BinFactory
: Factory trait implemented by NewBin
and NewSBin
.
* StrFactory
: Factory trait implemented by NewStr
and NewSStr
.
See the example tests:
Vec<u8>
and String
.Cow
that works with types that don't implement ToOwned
.Boo
with serde.Send + Sync
) and non-synchronized binaries / strings.It's quite young (development started in October 2020). The main functionality has been implemented. Things I might do:
loom
/ more tests.There's already other crates with similar functionality, why another one? / Features
This crate provides some features that cannot be found in other crates (or not all of them):
Bin
/Str
to Bin
/Str
) (usually zero-allocation / zero-copy).Bin
/Str
to &[u8]
/&str
).Why NewBin
, NewStr
? what's this?
Why let string = NewStr::from_static("Hello")
instead of just let string = Str::from_static("Hello")
(or implement From<&str> for Str
)? This is due to the decision to decouple the interface from the implementation. The Str
is the interface, whereas NewStr
is the factory of the built-in implementation. This library is designed to be extensible; you can provide your own implementation, tweaked for your use case.
How does the default-implementation NewBin
/ NewStr
work?
3 * sizeof(word) - 1
bytes; that's 23 bytes on a 64-bit platform. For reference, the string Hello, world!
only takes 13 bytes and could easily be stored on the stack.GivenVecConfig
). The reference-counter is stored inside the vector-data. This has those advantages:
Bin
from Vec<u8>
without allocation (if Vec<u8>
has some capacity left for the reference-counter) - something which is not possible by using Rc<[u8]>
.Rc<Vec<u8>>
) no second indirection is introduced.The only difference between NewBin
and NewSBin
is the reference-counted binaries: SBin
created by NewSBin
have a synchronized reference counter (AtomicUsize
).
Note: The same statements also apply to strings (since strings are backed by the binary implementation).
What operations are allocation-free / zero-copy?
It's not documented (in text) - and of course depends on the implementation ... but for the default-implementation (NewBin
/NewSBin
/NewStr
/NewSStr
) there's a test, see tests/noallocguarantees.rs.
Also see these two tests for single-allocation guarantee:
I want to write my own implementation, how to?
There's currently no documentation - but you can use the default implementation for reference. It's found in the module implementation
.
Why Boo
and not Cow
?
Cow
requires where B: 'a + ToOwned
. This does not work with this crate, since the implementation is separated from the interface. Say we have &[u8]
(borrowed), to convert that to owned (Bin
or SBin
), the implementation has to be known. I don't want Cow
to contain information about the implementation.
Aren't Bin
and Str
huge (stack-size)?
Bin
and Str
have a size of 4 words and are word-aligned. Yes, it's not small - but for reference, a Vec<u8>
also takes 3 words (pointer, length and capacity).
What is re-integration?
Say we have this code (pseudocode):
```
let largebinaryfromnetwork : Vec
// it's now possible to re-integrate that slice_of_that_bin
into the bin
it was sliced from.
// re-integration converts the borrowed type &[u8]
(slice_of_that_bin
) into an owned
// type (Bin
) without memory-allocation or memory-copy.
let binreintegrated : Bin = bin.tryreintegrate(sliceofthat_bin).unwrap();
```
This is useful if you want to de-serialize to owned (without using Boo
) using serde. When deserializing a type, we get slice_of_that_bin
from serde; using re-integration it's possible to get an owned binary (Bin
) without allocation.
Technical detail: It checks whether slice_of_that_bin
lies within the memory range of bin
; if so, it increments the reference-count of bin
by one, and the returned binary (bin_re_integrated
) is then just a sliced reference to bin
.
Name abin
?
It's named after the trait AnyBin
.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
See CONTRIBUTING.md.