Command-line utility to execute commands in parallel and aggregate their output.
Similar interface to GNU Parallel or xargs but implemented in rust and tokio.
* Supports running commands read from stdin or input files similar to xargs.
* Supports :::
syntax to run all combinations of argument groups similar to GNU Parallel.
Prevents output interleaving and is very fast.
See the demos for example usage.
``` $ rust-parallel --help Execute commands in parallel
By Aaron Riekenberg aaron.riekenberg@gmail.com
https://github.com/aaronriekenberg/rust-parallel https://crates.io/crates/rust-parallel
Usage: rust-parallel [OPTIONS] [COMMANDANDINITIAL_ARGUMENTS]...
Arguments: [COMMANDANDINITIAL_ARGUMENTS]... Optional command and initial arguments to run for each input line
Options: -c, --commands-from-args Run commands from arguments only.
In this mode the ::: separator is used to delimit groups of arguments.
The cartesian product of arguments from all groups are run.
-d, --discard-output
Possible values:
- stdout: Redirect stdout for commands to /dev/null
- stderr: Redirect stderr for commands to /dev/null
- all: Redirect stdout and stderr for commands to /dev/null
-i, --input-file
-j, --jobs
[default: 8]
-0, --null-separator Use null separator for reading input files instead of newline
-s, --shell Use shell mode for running commands.
Each command line is passed to "<shell-path> -c" as a single argument.
--channel-capacity <CHANNEL_CAPACITY>
Input and output channel capacity, defaults to num cpus * 2
[default: 16]
--shell-path <SHELL_PATH>
Path to shell to use for shell mode
[default: /bin/bash]
-h, --help Print help (see a summary with '-h')
-V, --version Print version ```
Recommended:
For manual installation/update:
1. Install Rust
2. Install the latest version of this app from crates.io:
$ cargo install rust-parallel
3. The same cargo install rust-parallel
command will also update to the latest version after initial installation.
There are 2 major ways to use rust-parallel:
1. Command line arguments mode using :::
syntax to separate argument groups similar to GNU parallel.
1. Reading commands from stdin and/or input files similar to xargs.
Demos of command line arguments mode are first as it is simpler to understand: 1. Commands from arguments mode 1. Commands from arguments mode bash function 1. Small demo of 5 echo commands 1. Debug logging 1. Specifying command and intial arguments on command line 1. Using awk to form complete commands 1. Using as part of a shell pipeline 1. Working on a set of files from find command 1. Reading multiple inputs 1. Calling a bash function
When -c/--commands-from-args
is specified, the :::
separator can be used to run the Cartesian Product of command line arguments. This is similar to the :::
behavior in GNU Parallel.
``` $ rust-parallel -c echo ::: A B ::: C D ::: E F G B C F A D E A C G A D F A D G A C F B C E A C E B D F B D E B D G B C G
$ rust-parallel -c echo hello ::: larry curly moe hello curly hello larry hello moe
$ rust-parallel -c gzip -k ::: *.html ```
Commands from arguments mode can be used to invoke a bash function.
``` $ logargs() { echo "logargs got $@" }
$ export -f logargs
$ rust-parallel -c -s logargs ::: A B C ::: D E F logargs got A F logargs got A D logargs got B E logargs got C E logargs got B D logargs got B F logargs got A E logargs got C D logargs got C F ```
Using command line arguments mode we can run 5 echo commands.
With -j5
all commands run in parallel, with -j1
commands run sequentially.
``` $ rust-parallel -j5 -c echo ::: hi there how are you how there you are hi
$ rust-parallel -j1 -c echo ::: hi there how are you hi there how are you ```
Exactly equivalent to above a file test
is created with 5 echo commands and piped to stdin of rust-parallel
.
One advantage of reading input from stdin or input files is it can process much larger amounts of inputs than command line arguments. Also this mode can be used as part of a shell pipeline.
```
$ cat >./test < $ cat test | rust-parallel -j5
are
hi
there
how
you $ cat test | rust-parallel -j1
hi
there
how
are
you
``` Set environment variable This logs structured information about command line arguments and commands being run. Recommend enabling debug logging for all demos to understand what is happening in more detail. ```
$ RUSTLOG=debug rust-parallel -c echo ::: hi there how are you | grep commandline_args 2023-06-16T12:14:45.602832Z DEBUG trymain: rustparallel::commandlineargs: commandlineargs = CommandLineArgs { commandsfromargs: true, discardoutput: None, inputfile: [], jobs: 8, nullseparator: false, shell: false, channelcapacity: 16, shellpath: "/bin/bash", commandandinitialarguments: ["echo", ":::", "hi", "there", "how", "are", "you"] } $ RUSTLOG=debug rust-parallel -c echo ::: hi there how are you | grep 'commandline_args:1' 2023-06-16T12:15:18.408524Z DEBUG Command::run{cmdargs=["echo", "there"] line=commandlineargs:1}: rustparallel::command: begin run
2023-06-16T12:15:18.410259Z DEBUG Command::run{cmdargs=["echo", "there"] line=commandlineargs:1 childpid=12523}: rustparallel::command: spawned child process, awaiting output
2023-06-16T12:15:18.413080Z DEBUG Command::run{cmdargs=["echo", "there"] line=commandlineargs:1 childpid=12523}: rustparallel::command: command exit status = exit status: 0
2023-06-16T12:15:18.413125Z DEBUG Command::run{cmdargs=["echo", "there"] line=commandlineargs:1 childpid=12523}: rust_parallel::command: end run
``` Here stdout and stderr from each command run are copied to stdout/stderr of the rust-parallel process. The ```
$ mkdir testdir $ touch 'testdir/a b' 'testdir/b c' 'testdir/c d' $ find testdir -type f -print0 | rust-parallel -0 gzip -f -k $ ls testdir
'a b' 'a b.gz' 'b c' 'b c.gz' 'c d' 'c d.gz'
``` By default ```
$ cat >./test < $ head -5 /usr/share/dict/words | rust-parallel -i - -i ./test echo
A
aalii
aa
a
aal
bar
foo
baz
``` Use ```
$ doit() {
echo Doing it for $1
sleep 2
echo Done with $1
} $ export -f doit $ cat >./test < $ cat test | rust-parallel -s
Doing it for 1
Done with 1
Doing it for 3
Done with 3
Doing it for 2
Done with 2
``` See the wiki page for benchmarks.Debug logging.
RUST_LOG=debug
to see debug output.Specifying command and intial arguments on command line:
md5 -s
will be prepended to each input line to form a command like md5 -s aal
$ head -100 /usr/share/dict/words | rust-parallel md5 -s
MD5 ("aal") = ff45e881572ca2c987460932660d320c
MD5 ("A") = 7fc56270e7a70fa81a5935b72eacbe29
MD5 ("aardvark") = 88571e5d5e13a4a60f82cea7802f6255
MD5 ("aalii") = 0a1ea2a8d75d02ae052f8222e36927a5
MD5 ("aam") = 35c2d90f7c06b623fe763d0a4e5b7ed9
MD5 ("aa") = 4124bc0a9335c27f086f24ba207a4912
MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661
MD5 ("Aani") = e9b22dd6213c3d29648e8ad7a8642f2f
MD5 ("Aaron") = 1c0a11cc4ddc0dbd3fa4d77232a4e22e
MD5 ("aardwolf") = 66a4a1a2b442e8d218e8e99100069877
Using
awk
to form complete commands:
$ head -100 /usr/share/dict/words | awk '{printf "md5 -s %s\n", $1}' | rust-parallel
MD5 ("Abba") = 5fa1e1f6e07a6fea3f2bb098e90a8de2
MD5 ("abaxial") = ac3a53971d52d9ce3277eadf03f13a5e
MD5 ("abaze") = 0b08c52aa63d947b6a5601ee975bc3a4
MD5 ("abaxile") = 21f5fc27d7d34117596e41d8c001087e
MD5 ("abbacomes") = 76640eb0c929bc97d016731bfbe9a4f8
MD5 ("abbacy") = 08aeac72800adc98d2aba540b6195921
MD5 ("Abbadide") = 7add1d6f008790fa6783bc8798d8c803
MD5 ("abb") = ea01e5fd8e4d8832825acdd20eac5104
Using as part of a shell pipeline.
$ head -100 /usr/share/dict/words | rust-parallel md5 -s | grep -i abba
MD5 ("Abba") = 5fa1e1f6e07a6fea3f2bb098e90a8de2
MD5 ("abbacomes") = 76640eb0c929bc97d016731bfbe9a4f8
MD5 ("abbacy") = 08aeac72800adc98d2aba540b6195921
MD5 ("Abbadide") = 7add1d6f008790fa6783bc8798d8c803
Working on a set of files from
find
command.-0
option works nicely with find -print0
to handle filenames with newline or whitespace characters:Reading multiple inputs.
rust-parallel
reads input from stdin only. The -i
option can be used 1 or more times to override this behavior. -i -
means read from stdin, -i ./test
means read from the file ./test
:Calling a bash function.
-s
shell mode so that each input line is passed to /bin/bash -c
as a single argument:Benchmarks:
Features:
#![forbid(unsafe_code)]
)O(number of input lines)
memory usage. In support of this:
tokio::sync::Semaphore
is used carefully to limit the number of commands that run concurrently. Do not spawn tasks for all input lines immediately to limit memory usage.Tech Stack:
multi_cartesian_product
to process :::
command line inputs.
async
/ await
functions (aka coroutines)CommandLineArgs
instance using tokio::sync::OnceCell
.tokio::process::Command
tokio::sync::Semaphore
used to limit number of commands that run concurrently.tokio::sync::mpsc::channel
used to receive inputs from input task, and to send command outputs to an output writer task. To await command completions, use the elegant property that when all Senders
are dropped the channel is closed.
tracing::Instrument
is used to provide structured debug logs.