Rjoin

rjoin is a new command line utility for joining records of two files on common fields.

Dual-licensed under MIT or unlicense

Documentation

https://docs.rs/rjoin

Installation

The binary name for rjoin is rj.

bash $ cargo --version cargo 0.25.0-nightly (a88fbace4 2017-12-29) # requires nightly channel $ RUSTFLAGS="-C target-cpu=native" cargo install rjoin

(don't forget to add $HOME/.cargo/bin to your path).

Why should you use rjoin?

Why you should not use rjoin?

Quick Example

Let's suppose we have the following data:

```bash $ cat left color,blue color,green color,red shape,circle shape,square

$ cat right altitude,low
altitude,high
color,orange
color,purple
``` To get the lines with the common key:

bash $ rj left right color,blue,orange color,blue,purple color,green,orange color,green,purple color,red,orange color,red,purple

Some comments:

To get the lines with the unmatched key in both files:

bash $ rj -lr left right altitude,low altitude,high shape,circle shape,square

Check the tutorial for the detailed walkthrough.

Contributing

Any kind of contribution (e.g. comment, suggestion, question, bug report and pull request) is welcome.

Why Rust?

Because C eats a bloody lot of mental resources only to avoid shooting my leg, or worse.

Acknowledgments

The CSV parser used in Rjoin is based on the work of Y. Li, N. R. Katsipoulakis, B. Chandramouli, J. Goldstein, and D. Kossmann. Mison: a fast JSON parser for data analytics. In VLDB, 2017.

The SIMD part was shamelessly copied from pikkr

And finally a big thanks to BurntSushi for his excellent work.