An Apache Parquet implementation in Rust

Usage

Add this to your Cargo.toml: toml [dependencies] parquet = "4.0.0"

and this to your crate root: rust extern crate parquet;

Example usage of reading data: ```rust use std::fs::File; use std::path::Path; use parquet::file::reader::{FileReader, SerializedFileReader};

let file = File::open(&Path::new("/path/to/file")).unwrap(); let reader = SerializedFileReader::new(file).unwrap(); let mut iter = reader.getrowiter(None).unwrap(); while let Some(record) = iter.next() { println!("{}", record); } ``` See crate documentation on available API.

Upgrading from versions prior to 4.0

If you are upgrading from version 3.0 or previous of this crate, you likely need to change your code to use [ConvertedType] rather than [LogicalType] to preserve existing behaviour in your code.

Version 2.4.0 of the Parquet format introduced a LogicalType to replace the existing ConvertedType. This crate used parquet::basic::LogicalType to map to the ConvertedType, but this has been renamed to parquet::basic::ConvertedType from version 4.0 of this crate.

The ConvertedType is deprecated in the format, but is still written to preserve backward compatibility. It is preferred that LogicalType is used, as it supports nanosecond precision timestamps without using the deprecated Int96 Parquet type.

Supported Parquet Version

To update Parquet format to a newer version, check if parquet-format version is available. Then simply update version of parquet-format crate in Cargo.toml.

Features

Requirements

Parquet requires LLVM. Our windows CI image includes LLVM but to build the libraries locally windows users will have to install LLVM. Follow this link for info.

Build

Run cargo build or cargo build --release to build in release mode. Some features take advantage of SSE4.2 instructions, which can be enabled by adding RUSTFLAGS="-C target-feature=+sse4.2" before the cargo build command.

Test

Run cargo test for unit tests. To also run tests related to the binaries, use cargo test --features cli.

Binaries

The following binaries are provided (use cargo install --features cli to install them): - parquet-schema for printing Parquet file schema and metadata. Usage: parquet-schema <file-path>, where file-path is the path to a Parquet file. Use -v/--verbose flag to print full metadata or schema only (when not specified only schema will be printed).

If you see Library not loaded error, please make sure LD_LIBRARY_PATH is set properly: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(rustc --print sysroot)/lib

Benchmarks

Run cargo bench for benchmarks.

Docs

To build documentation, run cargo doc --no-deps. To compile and view in the browser, run cargo doc --no-deps --open.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0.