Integer sets as fast, sorted, integer ranges with full set operations
The integers can be any size ([u8
] to [u128
]) and may be signed ([i8
] to [i128
]). The [set operations] include union
, intersection
, difference
, symmetric difference
, and complement
.
The crate's main struct is [RangeSetBlaze
], a set of integers. See the [RangeSetBlaze
documentation] for details.
Unlike the standard [
BTreeSet
] and [HashSet
], [RangeSetBlaze
] does not store every integer in the set. Rather, it stores sorted & disjoint ranges of integers in a cache-efficient [BTreeMap
]. It differs from other interval libraries -- that we know of -- by offering full set operations and by being optimized for sets of clumpy integers.We can construct a [
RangeSetBlaze
] from unsorted & redundant integers (or ranges). When the inputs are clumpy, construction will be linear in the number of inputs and set operations will be sped up quadratically.
The crate's main trait is [SortedDisjoint
]. It is implemented by iterators of sorted & disjoint ranges of integers. See the [SortedDisjoint
documentation] for details.
With any [
SortedDisjoint
] iterator we can perform set operations in one pass through the ranges and with minimal (constant) memory. It enforces the "sorted & disjoint" constraint at compile time. This trait is inspired by theSortedIterator
trait from the sorted_iter crate. [SortedDisjoint
] differs from its inspiration by specializing on disjoint integer ranges.
See the benchmarks for performance comparisons with other range-related crates.
Generally, for many tasks involving clumpy integers and ranges, RangeSetBlaze
is much faster than alternatives.
The benchmarks are in the benches
directory. To run them, use cargo bench
.
Example 1
Here we take the union (operator “|”) of two [RangeSetBlaze
]'s:
```rust use rangesetblaze::RangeSetBlaze;
// a is the set of integers from 100 to 499 (inclusive) and 501 to 1000 (inclusive) let a = RangeSetBlaze::fromiter([100..=499,501..=999]); // b is the set of integers -20 and the range 400 to 599 (inclusive) let b = RangeSetBlaze::fromiter([-20..=-20,400..=599]); // c is the union of a and b, namely -20 and 100 to 999 (inclusive) let c = a | b; asserteq!(c, RangeSetBlaze::fromiter([-20..=-20,100..=999])); ```
Example 2
In biology, suppose we want to find the intron regions of a gene but we are given only the transcription region and the exon regions.
We create a [RangeSetBlaze
] for the transcription region and a [RangeSetBlaze
] for all the exon regions.
Then we take the difference between the transcription region and exon regions to find the intron regions.
```rust use rangesetblaze::RangeSetBlaze;
let line = "chr15 29370 37380 29370,32358,36715 30817,32561,37380";
// split the line on white space let mut iter = line.split_whitespace(); let chr = iter.next().unwrap();
// Parse the start and end of the transcription region into a RangeSetBlaze let transstart: i32 = iter.next().unwrap().parse().unwrap(); let transend: i32 = iter.next().unwrap().parse().unwrap(); let trans = RangeSetBlaze::fromiter([transstart..=transend]); asserteq!(trans, RangeSetBlaze::from_iter([29370..=37380]));
// Parse the start and end of the exons into a RangeSetBlaze
let exonstarts = iter.next().unwrap().split(',').map(|s| s.parse::
// Use 'set difference' to find the introns let intron = trans - exons; asserteq!(intron, RangeSetBlaze::fromiter([30818..=32357, 32562..=36714])); for range in intron.ranges() { let (start, end) = range.into_inner(); println!("{chr}\t{start}\t{end}"); } ```