Simple tool for exporting PostgreSQL tables into parquet, with support for more esoteric Postgres features than just int
and text
.
Download the binary from Github Actions artifacts (click on the latest run, scroll to the bottom, choose your system).
If you use Nix, this command will install the latest pg2parquet version. It compiles it from sources, so the installation will take some time.
nix shell github:exyi/pg2parquet
Then use the pg2parquet
in the new shell. Note that you might need to add --extra-experimental-features 'nix-command flakes'
argument to the nix invocation.
cargo install pg2parquet
Install Rust and Cargo. Clone the repo.
bash
cd cli
env RUSTFLAGS="-C target-cpu=native" cargo build --release
It should finish in few minutes (~10 CPU minutes). Take the target/release/pg2parquet
file, delete rest of the target directory (it takes quite a bit of disk space). You can optionally strip
the binary, but you'll get poor stack trace if it crashes.
pg2parquet export --host localhost.for.example --dbname my_database --output-file output.parquet -t the_table_to_export
Alternatively, you can export result of a SQL query
pg2parquet export --host localhost.for.example --dbname my_database --output-file output.parquet -q 'select column_a, column_b::text from another_table'
You can also use environment variables $PGPASSWORD
and $PGUSER
text
, char
, varchar
and friends, all kinds of int
s, bool
, floating point numbers, timestamp
, timestamptz
, date
, time
, uuid
numeric
will have fixed precision according to the --decimal-scale
and --decimal-precision
parametersmoney
is always a 64-bit decimal with 2 decimal placesjson
and jsonb
: by default serialized as a text field with the JSON. --json-handling
option allows setting parquet metadata that the column is JSON, but the feature is not widely supported, thus it's disabled by default.xml
: serialized as textmacaddr
and inet
: by default written out in text representation. It's possible to serialize macaddr as bytes or Int64 using --macaddr-handling
option.bit
and varbit
: represented as text of 0
and 1
--enum-handling int
to serialize them as integersstruct { lower: T, upper: T, lower_inclusive: bool, upper_inclusive: bool, is_empty: bool }
NULL
is serialized as empty array (Parquet does not support NULL array without nesting it into another struct)--query 'select weird_type_column::text from my_table'
--query
parameter to shape the resulting schemaCOPY (SELECT my_col as myCol, ... FROM 'export.parquet') TO 'export2.parquet' (FORMAT PARQUET);
> pg2parquet export --help
``` Exports a PostgreSQL table or query to a Parquet file
Usage: pg2parquet export [OPTIONS] --output-file
Options:
-o, --output-file
-q, --query
-t, --table