qsv: Ultra-fast, data-wrangling CLI toolkit for CSVs

Ubuntu build status Windows build status macOS build status
qsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable:

  1. Simple tasks should be easy.
  2. Performance trade offs should be exposed in the CLI interface.

    3. Composition should not come at the expense of performance.

:warning: NOTE: qsv is a fork of the popular xsv utility, merging several pending PRs since xsv 0.13.0's release, along with additional features & commands for data-wrangling (NEW/EXTENDED commands are marked accordingly).

Available commands

| Command | Description | | --- | --- | | apply | Apply series of string transformations to a CSV column. (NEW) | | behead | Drop headers from CSV file. (NEW) | | cat | Concatenate CSV files by row or by column. | | count[^1] | Count the rows in a CSV file. (Instantaneous with an index.) | | dedup[^2] | Remove redundant rows. (NEW) | | enum | Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value. (NEW) | | exclude[^1] | Removes a set of CSV data from another set based on the specified columns. (NEW) | | explode | Explode rows into multiple ones by splitting a column value based on the given separator. (NEW) | | fill | Fill empty values. (NEW) | | fixlengths | Force a CSV file to have same-length records by either padding or truncating them. | | flatten | A flattened view of CSV records. Useful for viewing one record at a time. e.g., qsv slice -i 5 data.csv | qsv flatten. | | fmt | Reformat CSV data with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.) (EXTENDED) | | foreach | Loop over a CSV file to execute bash commands. (*nix only) (NEW) | | frequency^1 | Build frequency tables of each column in CSV data. (Uses parallelism to go faster if an index is present.) | | headers | Show the headers of CSV data. Or show the intersection of all headers between many CSV files. | | index | Create an index for a CSV file. This is very quick and provides constant time indexing into the CSV file. | | input | Read CSV data with exotic quoting/escaping rules. | | join[^1] | Inner, outer and cross joins. Uses a simple hash index to make it fast. (EXTENDED) | | jsonl | Convert newline-delimited JSON to CSV. (NEW) | lua | Execute a Lua script over CSV lines to transform, aggregate or filter them. (NEW) | | partition | Partition CSV data based on a column value. | | pseudo | Pseudonymise the value of the given column by replacing them with an incremental identifier. (NEW) | | rename | Rename the columns of CSV data efficiently. (NEW) | | replace | Replace CSV data using a regex. (NEW) | | reverse[^2] | Reverse order of rows in CSV data. (NEW) | | sample[^1] | Randomly draw rows from CSV data using reservoir sampling (i.e., use memory proportional to the size of the sample). (EXTENDED) | | search | Run a regex over CSV data. Applies the regex to each field individually and shows only matching rows. (EXTENDED) | | select[^1] | Select or re-order columns from CSV data. (EXTENDED) | | slice^1 | Slice rows from any part of a CSV file. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice). | | sort | Sort CSV data. (EXTENDED) | | split^1 | Split one CSV file into many CSV files of N chunks. | | stats^1[^3] | Show basic types and statistics of each column in the CSV file. (i.e., mean, standard deviation, variance, median, min/max, nullcount, etc.) (EXTENDED) | | table[^2] | Show aligned output of any CSV data using elastic tabstops. (EXTENDED) | | transpose[^2] | Transpose rows/columns of CSV data. (NEW) |

Installation

Binaries for Windows, Linux and macOS are available from Github.

Alternatively, you can compile from source by installing Cargo (Rust's package manager) and installing qsv using Cargo:

bash cargo install qsv

Compiling from this repository also works similarly:

bash git clone git://github.com/jqnatividad/qsv cd qsv cargo build --release

The binary will end up in ./target/release/qsv.

If you want to squeeze more performance from your build, set this environment variable before compiling:

bash export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'

Do note though that the resulting binary will only run on machines with the same architecture as the machine you compiled from. To find out your CPU architecture and other valid values for target-cpu:

bash rustc --print target-cpus

Benchmarks

Some very rough benchmarks of various qsv commands.

License

Dual-licensed under MIT or the UNLICENSE.