ustr

Fast, FFI-friendly string interning.

![Build Status] ![Latest Version] ![Docs Badge]

A Ustr (Unique str) is a lightweight handle representing a static, immutable entry in a global string cache, allowing for:

The downside is no strings are ever freed, so if you're creating lots and lots of strings, you might run out of memory. On the other hand, War and Peace is only 3MB, so it's probably fine.

This crate is based on OpenImageIO's (OIIO) ustring but it is not binary-compatible (yet). The underlying hash map implementation is directy ported from OIIO.

Usage

```rust use ustr::{Ustr, ustr};

// Creation is quick and easy using either Ustr::from or the ustr short // function and only one copy of any string is stored let h1 = Ustr::from("hello"); let h2 = ustr("hello");

// Comparisons and copies are extremely cheap let h3 = h1; assert_eq!(h2, h3);

// You can pass straight to FFI let len = unsafe { libc::strlen(h1.ascharptr()) }; assert_eq!(len, 5);

// For best performance when using Ustr as key for a HashMap or HashSet, // you'll want to use the precomputed hash. To make this easier, just use // the UstrMap and UstrSet exports: use ustr::UstrMap;

// Key type is always Ustr let mut map: UstrMap = UstrMap::default(); map.insert(u1, 17); assert_eq!(*map.get(&u1).unwrap(), 17); ```

By enabling the "serialize" feature you can serialize individual Ustrs or the whole cache with serde.

```rust use ustr::{Ustr, ustr};

let user = ustr("serialization is fun!"); let json = serdejson::tostring(&user).unwrap(); let ude : Ustr = serdejson::from_str(&json).unwrap();

asserteq!(user, u_de); ```

Since the cache is global, use the ustr::DeserializedCache dummy object to drive the deserialization.

```rust ustr("Send me to JSON and back"); let json = serdejson::tostring(ustr::cache()).unwrap();

// ... some time later ... let : ustr::DeserializedCache = serdejson::fromstr(&json).unwrap(); asserteq!(ustr::numentries(), 1); asserteq!(ustr::stringcacheiter().collect::>(), vec!["Send me to JSON and back"]);

```

Calling from C/C++

If you are writing a library that uses ustr and want users to be able to create Ustrs to pass to your API from C, add ustr_extern.rs to your crate and use include/ustr.h or include/ustr.hpp for function declarations.

Changelog

Changes since 0.9

and thanks to virtualritz:

Changes since 0.8

Changes since 0.7

Changes since 0.6

Changes since 0.5

Changes since 0.4

Changes since 0.3

Changes since 0.2

Speed

Ustrs are significantly faster to create than string-interner or string-cache. Creating 100,000 cycled copies of ~20,000 path strings of the form:

text /cgi-bin/images/admin /modules/templates/cache /libraries/themes/wp-includes ... etc.

raft bench

Why?

It is common in certain types of applications to use strings as identifiers, but not really do any processing with them. To paraphrase from OIIO's ustring documentation:

Compared to standard strings, Ustrs have several advantages:

On the whole, Ustrs are a really great string representation

Ustrs are not so hot:

Safety and Compatibility

This crate contains a significant amount of unsafe but usage has been checked and is well-documented. It is also run through Miri as part of the CI process.

I use it regularly on 64-bit systems, and it has passed Miri on a 32-bit system as well, bit 32-bit is not checked regularly. If you want to use it on 32-bit, please make sure to run Miri and open and issue if you find any problems.

Licence

BSD+ License

Copyright © 2019—2020 Anders Langlands

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Subject to the terms and conditions of this license, each copyright holder and contributor hereby grants to those receiving rights under this license a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except for failure to satisfy the conditions of this license) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer this software, where such license applies only to those patent claims, already acquired or hereafter acquired, licensable by such copyright holder or contributor that are necessarily infringed by:

(a) their Contribution(s) (the licensed copyrights of copyright holders and non-copyrightable additions of contributors, in source or binary form) alone; or

(b) combination of their Contribution(s) with the work of authorship to which such Contribution(s) was added by such copyright holder or contributor, if, at the time the Contribution is added, such addition causes such combination to be necessarily infringed. The patent license shall not apply to any other combinations which include the Contribution.

Except as expressly stated above, no rights or licenses from any copyright holder or contributor is granted under this license, whether expressly, by implication, estoppel or otherwise.

DISCLAIMER

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Contains code ported from OpenImageIO, BSD 3-clause licence.

Contains a copy of Max Woolf's Big List of Naughty Strings, MIT licence.

Contains some strings from SecLists, MIT licence.