A Disjoint-Set data structure (aka Union-Find w/ Rank)

What is Union-Find?

Suppose you have a collection S of elements e1, e2, ..., en, and wish to group them into different collections using operations:

Then a Union-Find data structure helps to store the underlying groups very efficiently and implements this API.

Note: The variant implemented uses Path Compression to further improve the performance.

(Some) Applications

Some interesting lecture notes regarding Union-Find.

Usage

Setup

In Cargo.toml, add this crate as a dependency.

toml [dependencies] reunion = { version = "0.1" }

API

Example 1

Task: Create a UnionFind data structure of arbitrary size that contains usize at its elements. Then, union a few elements and capture the state of the data structure after that.

Solution:

```rust

use reunion::{UnionFind, UnionFindTrait}; use std::collections::HashSet;

fn main() { // Create a UnionFind data structure of arbitrary size that contains subsets of usizes. let mut uf1 = UnionFind::::new();

println!("Initial state: {}", &uf);
println!("All elements form their own group (singletons).");
println!(format!("{:?}", uf.subsets());
uf.union(2, 1);
println!("After combining the groups that contains 2 and 1: {}", &uf);
uf.union(4, 3);
println!("After combining the groups that contains 4 and 3: {}", &uf);
uf.union(6, 5);
println!("After combining the groups that contains 6 and 5: {}", &uf);

let mut hs1 = HashSet::new();
hs1.insert(1);
hs1.insert(2);
let mut hs2 = HashSet::new();
hs2.insert(3);
hs2.insert(4);
let mut hs3 = HashSet::new();
hs3.insert(5);
hs3.insert(6);

let mut subsets = uf.subsets();
assert_eq!(subsets.len(), 3);

assert!(&subsets.contains(&hs1));
assert!(&subsets.contains(&hs2));
assert!(&subsets.contains(&hs3));

uf.union(1, 5);

println!("After combining the groups that contains 1 and 5: {}", &uf);

subsets = uf.subsets();
assert_eq!(subsets.len(), 2);

hs3.extend(&hs1);

assert!(&subsets.contains(&hs3));
assert!(&subsets.contains(&hs2));

let mut uf_clone = uf.clone();
uf_clone.find(2);

assert_eq!(&uf, &uf_clone);
println!("{}", &uf);

// It is possible to iterate over the subsets.

for partition in uf1 {
    println!("{:?}", partition);
}

}

```

Example 2

Task: Create a UnionFind data structure of size at least 10, that contains u16 at its elements.

Note: The size business only helps for reducing the number of memory reallocations required. Therefore, it is not too special and is totally optional.

Solution:

```rust

// Create a UnionFind data structure of a fixed size that contains subsets of u16. let mut uf2 = UnionFind::::with_capacity(10);

println!("{}", uf2);

```

Performance

Benchmark

DIY

To benchmark on your machine:

  1. Clone this repository.
  2. Run cargo bench

You should see some output like this:

```

Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.013497126s, Time per operation: 285ns

Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.013323348s, Time per operation: 285ns

Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.012333206s, Time per operation: 285ns

...

Big Merge (20, 10000) time: [1.0175 s 1.0190 s 1.0205 s]
change: [-0.4773% -0.2721% -0.0647%] (p = 0.01 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 10 (10.00%) high mild 3 (3.00%) high severe

... ```

Summary

On a AMD Ryzen 9 3900X 12-Core Processor (with lots of other processes running), working with a UnionFind of size 2 ** 20, a total of 3,545,725 operations take roughly 1 second, which is expected because the time complexity for these operations is effectively O(1) (in truth it is O(alpha(n)) where alpha(n) is the inverse Ackermann function but it grows so slow that we can hand wave it asa constant).