warc-parquet

🗄️ A simple tool for converting WARC files to Parquet files.

📦 Install

The binary may be installed via cargo:

sh $ cargo install warc-parquet

🤸 Usage

Once installed, WARC files can be passed to the program with a target output path which Parquet will be written to:

sh $ wget --warc-file example 'https://example.com' $ warc-parquet --gzipped example.warc.gz example.snappy.parquet

⚠️ Note that the Parquet path WILL be overwritten.