A command-line tool for converting Parquet to newline-delimited JSON.
It uses the excellent Apache Parquet Official Native Rust Implementation.
Install from crates.io and execute from the command line, e.g.:
```shell $ cargo install parquet2json $ parquet2json --help
USAGE:
parquet2json [OPTIONS]
ARGS:
FLAGS: -h, --help Prints help information -V, --version Prints version information
OPTIONS:
-l, --limit
Credentials are provided as per standard AWS toolchain, i.e. per environment variables (AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
), AWS credentials file or IAM ECS container/instance profile.
The default AWS region must be set per environment variable (AWS_DEFAULT_REGION
) o in AWS credentials file and must match region of the bucket the bucket is located in.
Use it to stream output to files and other tools such as grep
and jq.
shell
$ parquet2json ./myfile.pq > output.ndjson
shell
$ parquet2json ./myfile.pq | jq 'select(.level==3) | .id'
shell
$ parquet2json s3://amazon-reviews-pds/parquet/product_category=Gift_Card/part-00000-495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet
shell
$ parquet2json https://amazon-reviews-pds.s3.us-east-1.amazonaws.com/parquet/product_category%3DGift_Card/part-00000-495c48e6-96d6-4650-aa65-3c36a3516ddd.c000.snappy.parquet