A command line tool to query an ODBC data source and write the result into a parquet file.
The tool queries the ODBC Data source for type information and maps it to parquet type as such:
| ODBC SQL Type | Parquet Logical Type | |-----------------------|------------------------| | Decimal(p, s) | Decimal(p,s) | | Numeric(p, s) | Decimal(p,s) | | Bit | Boolean | | Double | Double | | Real | Float | | Float | Float | | Tiny Integer | Int8 | | Small Integer | Int16 | | Integer | Int32 | | Big Int | Int64 | | Date | Date | | Timestamp(p: 0..3) | Timestamp Milliseconds | | Timestamp(p >= 4) | Timestamp Microseconds | | All others | Utf8 Byte Array |
p
is short for precision
. s
is short for scale
. Intervals are inclusive.
https://github.com/pacman82/odbc2parquet/releases/latest
Note: Download the 32 Bit version if you want to connect to data sources using 32 Bit drivers and download the 64 Bit version if you want to connect via 64 Bit drivers. It won't work vice versa.
If you have a rust tool chain installed, you can install this tool via cargo.
shell script
cargo install odbc2parquet
You can install cargo
from here https://rustup.rs/.
bash
odbc2parquet query \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
out.par \
"SELECT * FROM Birthdays"
bash
odbc2parquet query \
--dsn my_db \
--password "<YourStrong@Passw0rd>" \
--user "SA" \
out.par1 \
"SELECT * FROM Birthdays"
bash
odbc2parquet list-drivers
bash
odbc2parquet list-data-sources
shell
odbc2parquet query \
--connection-string "Driver={ODBC Driver 17 for SQL Server};Server=localhost;UID=SA;PWD=<YourStrong@Passw0rd>;" \
out.par \
"SELECT * FROM Birthdays WHERE year > ? and year < ?" \
1990 2010
Use odbc2parquet --help
to see all option.
Thanks to @samaguire there is a script for Powershell users which helps you to download a bunch of tables to a folder: https://github.com/samaguire/odbc2parquet-PSscripts