dply is a command line tool for viewing, querying, and writing csv and parquet files, inspired by dplyr and powered by polars.
A dply pipeline consists of a number of functions to read, transform, or write data to disk.
The following pipeline reads a parquet file[^1], computes the minimum, mean, and
maximum fare for each payment type, saves the result to fares.csv
CSV file, and
shows the result:
$ dply -c 'parquet("nyctaxi.parquet") |
group_by(payment_type) |
summarize(
min_price = min(total_amount),
mean_price = mean(total_amount),
max_price = max(total_amount)
) |
arrange(payment_type) |
csv("fares.csv") |
show()'
shape: (5, 4)
┌──────────────┬───────────┬────────────┬───────────┐
│ payment_type ┆ min_price ┆ mean_price ┆ max_price │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞══════════════╪═══════════╪════════════╪═══════════╡
│ Cash ┆ -61.85 ┆ 18.07 ┆ 86.55 │
│ Credit card ┆ 4.56 ┆ 22.969491 ┆ 324.72 │
│ Dispute ┆ -55.6 ┆ -0.145161 ┆ 54.05 │
│ No charge ┆ -16.3 ┆ 0.086667 ┆ 19.8 │
│ Unknown ┆ 9.96 ┆ 28.893333 ┆ 85.02 │
└──────────────┴───────────┴────────────┴───────────┘
250 rows parquet file sampled from the NYC trip record data.
dply
supports the following functions:
more examples can be found in the tests folder.
Binaries generated by the release Github action for Linux, macOS (x86), and Windows are available in the releases page.
You can also install dply
using Cargo:
bash
cargo install dply
or by building it from this repository:
bash
git clone https://github.com/vincev/dply-rs
cd dply-rs
cargo install --path .