🌾 crop

![Latest version] ![CI] ![Docs]

crop is an implementation of a text rope, a data structure designed to be used in applications that need to handle frequent edits to arbitrarily large buffers, such as text editors.

crop's Rope is backed by a B-tree, ensuring that the time complexity of inserting, deleting or replacing a piece of text is always logarithmic in the size of the Rope.

crop places an extreme focus on performance: check out the benchmarks to see how it stacks up against similar projects.

Built with parallelism in mind

Ropes use thread-safe reference counting to share data between threads. Cloning a Rope takes up only 16 extra bytes of memory, and its copy-on-write semantics allow the actual text contents to be cloned incrementally as different clones diverge due to user edits.

This allows to cheaply snapshot a Rope and send to a background thread to perform any IO or CPU-intensive computations, while the main thread is kept responsive and always ready for the next batch of edits.

Example usage

``rust // ARopecan be created either directly from a string or incrementally // using theRopeBuilder`.

let mut builder = RopeBuilder::new();

builder .append("I am a 🦀\n") .append("Who walks the shore\n") .append("And pinches toes all day.\n") .append("\n") .append("If I were you\n") .append("I'd wear some 👟\n") .append("And not get in my way.\n");

let mut rope: Rope = builder.build();

// Ropes can be sliced to obtain RopeSlices. // // A RopeSlice is to a Rope as a &str is to a String: the former in // each pair is a borrowed reference of the latter.

// A Rope can be sliced using either byte offsets:

let byteslice: RopeSlice = rope.byteslice(..32);

asserteq!(byteslice, "I am a 🦀\nWho walks the shore\n");

// or line offsets:

let lineslice: RopeSlice = rope.lineslice(..2);

asserteq!(lineslice, byte_slice);

// We can also get a RopeSlice by asking the Rope for a specific line // index:

assert_eq!(rope.line(5), "I'd wear some 👟");

// We can modify that line by getting its start/end byte offsets:

let start: usize = rope.byteofline(5);

let end: usize = rope.byteofline(6);

// and replacing that byte range with some other text:

rope.replace(start..end, "I'd rock some 👠\n");

assert_eq!(rope.line(5), "I'd rock some 👠");

// Ropes use Arcs to share data between threads, so cloning them is // extremely cheap.

let snapshot: Rope = rope.clone();

// This allows to save a Rope to disk in a background thread while // keeping the main thread responsive.

thread::spawn(move || { let mut file = BufWriter::new(File::create("mylittlepoem.txt").unwrap());

// The text content is stored as separate chunks in the leaves of the
// B-tree.
//
// We can iterate over them using the `Chunks` iterator which yields the
// chunks of the `Rope` as string slices.

for chunk in snapshot.chunks() {
    file.write_all(chunk.as_bytes()).unwrap();
}

}) .join() .unwrap(); ```

Check out the docs for a more in-depth overview of the crate.

Comparison with other ropes

As of April 2023 there are (to my knowledge) 3 rope crates that are still actively maintained: crop, Jumprope and Ropey. The following is a quick (and incomplete) overview of their features and tradeoffs to help you decide which one is best suited for your specific use case.

Speed

Jumprope is currently the fastest rope, beating crop by 0-15% on the real world editing traces provided by [crdt-benchmarks] (the results are listed below). The synthetic benchmarks show a similar story, although Jumprope's performance seems to degrade more rapidly than crop's as the size of the inserted/deleted/replaced text increases, as shown by the {insert,delete,replace}_large benchmarks.

| Dataset | crop (ms) | Jumprope (ms) | Ropey (ms) | std::string::String (ms) | |-----------------|-----------|---------------|------------|----------------------------| | automerge-paper | 14.62 | 12.59 | 44.14 | 108.57 | | rustcode | 3.142 | 2.855 | 8.009 | 13.398 | | sveltecomponent | 1.089 | 1.084 | 3.645 | 1.215 | | seph-blog1 | 7.845 | 7.423 | 23.46 | 22.26 |

Cheap clones

Both crop and Ropey allow their Ropes to be cloned in O(1) in time and space by sharing data between clones, whereas cloning a JumpRope involves re-allocating the actual text contents, just like it would with a regular String.

Unsafe

crop and Ropey use a tiny bit of unsafe code to achieve their performance characteristics. Jumprope is a Rust port of a C codebase and has a lot of unsafe, making "heavy use of unsafe pointers".

Indexing metric

Jumprope and Ropey both use Unicode codepoint offsets (chars in Rust) as their primary indexing metric, with Jumprope also supporting UTF-16 offsets (more on this below). crop uses UTF-8 code unit (aka byte) offsets, just like Rust's Strings.

Line breaks

Both crop and Ropey track line breaks, allowing you to convert between line and byte offsets and to iterate over the lines of their Ropes and RopeSlices. Ropey can be configured to recognize all Unicode line breaks, while crop only recognizes LF and CRLF as line terminators.

Jumprope doesn't currently have any line-based APIs.

UTF-16 support

While all the ropes store text internally as UTF-8, both Jumprope and Ropey have APIs to efficiently convert between UTF-16 code unit offsets and char offsets in O(log n) in the size of the rope.

crop doesn't currently implement those APIs but adding them would be extremely easy. If you need those features in your application please open an issue describing your use case.

Acknowledgements