qsv is a command line program for indexing, slicing, analyzing, splitting, enriching,
validating & joining CSV files. Commands are simple, fast and composable:
NOTE: qsv is a fork of the popular xsv utility, merging several pending PRs since xsv 0.13.0's release, along with additional features & commands for data-wrangling. See FAQ for more details. (NEW and EXTENDED commands are marked accordingly).
| Command | Description |
| --- | --- |
| apply | Apply series of string, date, currency & geocoding transformations to a CSV column. (NEW) |
| behead | Drop headers from a CSV. (NEW) |
| cat | Concatenate CSV files by row or by column. |
| count[^1] | Count the rows in a CSV file. (Instantaneous with an index.) |
| dedup[^2] | Remove redundant rows. (NEW) |
| enum | Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value. (NEW) |
| exclude[^1] | Removes a set of CSV data from another set based on the specified columns. (NEW) |
| explode | Explode rows into multiple ones by splitting a column value based on the given separator. (NEW) |
| fill | Fill empty values. (NEW) |
| fixlengths | Force a CSV to have same-length records by either padding or truncating them. |
| flatten | A flattened view of CSV records. Useful for viewing one record at a time.
e.g. qsv slice -i 5 data.csv \| qsv flatten
. |
| fmt | Reformat a CSV with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.) (EXTENDED) |
| foreach | Loop over a CSV to execute bash commands. (*nix only) (NEW) |
| frequency^1 | Build frequency tables of each column. (Uses parallelism to go faster if an index is present.) |
| headers | Show the headers of a CSV. Or show the intersection of all headers between many CSV files. |
| index | Create an index for a CSV. This is very quick & provides constant time indexing into the CSV file. |
| input | Read a CSV with exotic quoting/escaping rules. |
| join[^1] | Inner, outer, cross, anti & semi joins. Uses a simple hash index to make it fast. (EXTENDED) |
| jsonl | Convert newline-delimited JSON to CSV. (NEW)
| lua | Execute a Lua script over CSV lines to transform, aggregate or filter them. (NEW) |
| partition | Partition a CSV based on a column value. |
| pseudo | Pseudonymise the value of the given column by replacing them with an incremental identifier. (NEW) |
| rename | Rename the columns of a CSV efficiently. (NEW) |
| replace | Replace CSV data using a regex. (NEW) |
| reverse[^2] | Reverse order of rows in a CSV. (NEW) |
| sample[^1] | Randomly draw rows from a CSV using reservoir sampling (i.e., use memory proportional to the size of the sample). (EXTENDED) |
| search | Run a regex over a CSV. Applies the regex to each field individually & shows only matching rows. (EXTENDED) |
| searchset | Run multiple regexes over a CSV in a single pass. Applies the regexes to each field individually & shows only matching rows. (NEW) |
| select[^1] | Select or re-order columns. (EXTENDED) |
| slice^1 | Slice rows from any part of a CSV. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice). |
| sort | Sort CSV data. (EXTENDED) |
| split^1 | Split one CSV file into many CSV files of N chunks. |
| stats^1[^3] | Show basic types & statistics of each column in a CSV. (i.e., sum, min/max, min/max length, mean, stddev, variance, quartiles, IQR, lower/upper fences, skew, median, mode, cardinality & nullcount) (EXTENDED) |
| table[^2] | Show aligned output of a CSV using elastic tabstops. (EXTENDED) |
| transpose[^2] | Transpose rows/columns of a CSV. (NEW) |
Binaries for Windows, Linux and macOS are available from Github.
Alternatively, you can compile from source by
installing Cargo
(Rust's package manager)
and installing qsv
using Cargo:
bash
cargo install qsv
Compiling from this repository also works similarly:
bash
git clone git://github.com/jqnatividad/qsv
cd qsv
cargo build --release
The compiled binary will end up in ./target/release/qsv
.
If you want more performance, set this environment variable BEFORE installing/compiling:
On Linux and macOS:
bash
export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
On Windows Powershell:
powershell
$env:CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
Do note though that the resulting binary will only run on machines with the
same architecture as the machine you installed/compiled from.
To find out your CPU architecture and other valid values for target-cpu
:
bash
rustc --print target-cpus
You can also get more performance by using the performance-oriented mimalloc
memory allocator. To do so, install/compile qsv with the mimalloc
feature.
bash
cargo install qsv --features=mimalloc
or
bash
cargo build --features=mimalloc
Some very rough benchmarks of
various qsv
commands.
Dual-licensed under MIT or the UNLICENSE.
qsv was made possible by datHere - Data Infrastructure Engineering.
Standards-based, best-of-breed, open source solutions to make your Data Useful, Usable & Used.
This project is unrelated to Intel's Quick Sync Video.