Iterator adaptors for efficient SQL-like joins. The library is documented here.
To use it, put the following code to your Cargo.toml:
toml
[dependencies]
joinkit = "*"
and then include the following code in your crate:
```rust
extern crate joinkit;
use joinkit::Joinkit; ```
This crate provides two binaries: hjoin
and mjoin
, which can be used
to join data on command line using Hash Join
and Merge Join
strategy respectively.
See the documentation to learn more about the join strategies.
You can also run hjoin --help
or mjoin --help
to learn about their usage.
Prepare test data:
```bash
datapath=/tmp/join
if ! [[ -d $datapath ]]; then mkdir -p $datapath; fi
cd $datapath
gawk 'BEGIN{n=20;for(i=0;i gawk 'BEGIN{n=20;for(i=0;i gawk 'BEGIN{n=1000000;for(i=0;i gawk 'BEGIN{n=1000000;for(i=0;i clone repository:
The output contains only the rows, which have the key present in both input files. Note, in case of ./hjoin -1 1-u -2 1-u $datapath/left-num-20 $datapath/right-num-20
``` This is equivalent to: ```bash
./hjoin -1 1 -2 1 -m inner -R $'\n' -F ',' $datapath/left-char-20 $datapath/right-char-20 ./hjoin -1 1 -2 1 --mode inner --in-rec-sep $'\n' --in-fieldsep ',' --out-rec-sep $'\n' --out-field-sep ',' $datapath/left-char-20 $data_path/right-char-20 ./hjoin -1 1 -2 1 --mode inner --in-rec-sep-left $'\n' --in-rec-sep-right $'\n' --in-fieldsep-left ',' --in-fieldsep-right ',' --out-rec-sep $'\n' --out-field-sep ',' $datapath/left-char-20 $datapath/right-char-20
``` Since both input files are sorted on the join key, we can get the same results using The output contains only the rows, which have the key present in the left
input file exclusively. ```bash
./hjoin -1 1 -2 1 -m left-excl $datapath/left-char-20 $datapath/right-char-20 ./mjoin -1 1 -2 1 -m left-excl $datapath/left-char-20 $datapath/right-char-20
``` The output contains the rows, which are union of ```bash
./hjoin -1 1 -2 1 -m left-outer $datapath/left-char-20 $datapath/right-char-20 ./mjoin -1 1 -2 1 -m left-outer $datapath/left-char-20 $datapath/right-char-20
``` The output contains only the rows, which have the key present in the right
input file exclusively.
Note, in case of ./mjoin -1 1 -2 1 -m right-excl $datapath/left-char-20 $datapath/right-char-20
``` The output contains the rows, which are union of ```bash
./hjoin -1 1 -2 1 -m right-outer $datapath/left-char-20 $datapath/right-char-20 ./mjoin -1 1 -2 1 -m right-outer $datapath/left-char-20 $datapath/right-char-20
``` The output contains the rows, which are union of ```bash
./hjoin -1 1 -2 1 -m full-outer $datapath/left-char-20 $datapath/right-char-20 ./mjoin -1 1 -2 1 -m full-outer $datapath/left-char-20 $datapath/right-char-20
``` TODO Joinkit is licenced under MIT license.bash
cd ~/some/local/path
git clone https://github.com/milancio42/joinkit.git
cd joinkit
cargo build --release
cd target/release
Inner Join
The join key in the left file is composed by the second and the first column, whereas the join key in the right file is composed by the first and the second column (the order is important). hjoin
, the right input data is loaded into HashMap
.
```bash
./hjoin -1 1 -2 1 $datapath/left-char-20 $datapath/right-char-20in order to join on numeric data, use '-u' flag to convert a string to 'u64' (or '-i' to 'i64')
mjoin
:bash
./mjoin -1 1 -2 1 $data_path/left-char-20 $data_path/right-char-20
Left Exclusive Join
Left Outer Join
inner join
and left exclusive join
.Right Exclusive Join
hjoin
, the output is ordered based on HashMap
's internal
ordering, which is very likely different from that of the input.
```bash
./hjoin -1 1 -2 1 -m right-excl $datapath/left-char-20 $datapath/right-char-20Right Outer Join
inner join
and right exclusive join
.
Note, in case of hjoin
, the output is ordered based on HashMap
's internal
ordering, which is very likely different from that of the input.Full Outer Join
left exclusive join
, inner
join
and right exclusive join
.
Note, in case of hjoin
, the output is ordered based on HashMap
's internal
ordering, which is very likely different from that of the input.Performance
Licence