qsv: Ultra-fast, data-wrangling CLI toolkit for CSVs

Ubuntu build status Windows build status macOS build status Security audit Crates.io Discussions Docs
qsv is a command line program for indexing, slicing, analyzing, splitting, enriching, validating & joining CSV files. Commands are simple, fast and composable:

  1. Simple tasks are easy.
  2. Performance trade offs are exposed in the CLI interface.

    3. Composition does not come at the expense of performance.

NOTE: qsv is a fork of the popular xsv utility, merging several pending PRs since xsv 0.13.0's release, along with additional features & commands for data-wrangling. See FAQ for more details. (NEW and EXTENDED commands are marked accordingly).

Available commands

| Command | Description | | --- | --- | | apply | Apply series of string, date, currency & geocoding transformations to a CSV column. (NEW) | | behead | Drop headers from a CSV. (NEW) | | cat | Concatenate CSV files by row or by column. | | count[^1] | Count the rows in a CSV file. (Instantaneous with an index.) | | dedup[^2] | Remove redundant rows. (NEW) | | enum | Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value. (NEW) | | exclude[^1] | Removes a set of CSV data from another set based on the specified columns. (NEW) | | explode | Explode rows into multiple ones by splitting a column value based on the given separator. (NEW) | | fill | Fill empty values. (NEW) | | fixlengths | Force a CSV to have same-length records by either padding or truncating them. | | flatten | A flattened view of CSV records. Useful for viewing one record at a time.
e.g. qsv slice -i 5 data.csv \| qsv flatten. | | fmt | Reformat a CSV with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.) (EXTENDED) | | foreach | Loop over a CSV to execute bash commands. (*nix only) (NEW) | | frequency^1 | Build frequency tables of each column. (Uses parallelism to go faster if an index is present.) | | headers | Show the headers of a CSV. Or show the intersection of all headers between many CSV files. | | index | Create an index for a CSV. This is very quick & provides constant time indexing into the CSV file. | | input | Read a CSV with exotic quoting/escaping rules. | | join[^1] | Inner, outer, cross, anti & semi joins. Uses a simple hash index to make it fast. (EXTENDED) | | jsonl | Convert newline-delimited JSON to CSV. (NEW) | lua | Execute a Lua script over CSV lines to transform, aggregate or filter them. (NEW) | | partition | Partition a CSV based on a column value. | | pseudo | Pseudonymise the value of the given column by replacing them with an incremental identifier. (NEW) | | rename | Rename the columns of a CSV efficiently. (NEW) | | replace | Replace CSV data using a regex. (NEW) | | reverse[^2] | Reverse order of rows in a CSV. (NEW) | | sample[^1] | Randomly draw rows from a CSV using reservoir sampling (i.e., use memory proportional to the size of the sample). (EXTENDED) | | search | Run a regex over a CSV. Applies the regex to each field individually & shows only matching rows. (EXTENDED) | | searchset | Run multiple regexes over a CSV in a single pass. Applies the regexes to each field individually & shows only matching rows. (NEW) | | select[^1] | Select or re-order columns. (EXTENDED) | | slice^1 | Slice rows from any part of a CSV. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice). | | sort | Sort CSV data. (EXTENDED) | | split^1 | Split one CSV file into many CSV files of N chunks. | | stats^1[^3] | Show basic types & statistics of each column in a CSV. (i.e., sum, min/max, min/max length, mean, stddev, variance, quartiles, IQR, lower/upper fences, skew, median, mode, cardinality & nullcount) (EXTENDED) | | table[^2] | Show aligned output of a CSV using elastic tabstops. (EXTENDED) | | transpose[^2] | Transpose rows/columns of a CSV. (NEW) |

Installation

Binaries for Windows, Linux and macOS are available from Github.

Alternatively, you can compile from source by installing Cargo (Rust's package manager) and installing qsv using Cargo:

bash cargo install qsv

Compiling from this repository also works similarly:

bash git clone git://github.com/jqnatividad/qsv cd qsv cargo build --release

The compiled binary will end up in ./target/release/qsv.

Performance

If you want more performance, set this environment variable BEFORE installing/compiling:

On Linux and macOS: bash export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'

On Windows Powershell: powershell $env:CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'

Do note though that the resulting binary will only run on machines with the same architecture as the machine you installed/compiled from.
To find out your CPU architecture and other valid values for target-cpu:

bash rustc --print target-cpus You can also get more performance by using the performance-oriented mimalloc memory allocator. To do so, install/compile qsv with the mimalloc feature.

bash cargo install qsv --features=mimalloc or bash cargo build --features=mimalloc

Benchmarks

Some very rough benchmarks of various qsv commands.

License

Dual-licensed under MIT or the UNLICENSE.

Sponsor

qsv was made possible by datHere - Data Infrastructure Engineering.
Standards-based, best-of-breed, open source solutions to make your Data Useful, Usable & Used.

Naming Collision

This project is unrelated to Intel's Quick Sync Video.