🗄️ A simple tool for converting WARC files to Parquet files.
The binary may be installed via cargo
:
sh
$ cargo install warc-parquet
Once installed, WARC files can be passed to the program with a target output path which Parquet will be written to:
sh
$ wget --warc-file example 'https://example.com'
$ warc-parquet --gzipped example.warc.gz example.snappy.parquet
⚠️ Note that the Parquet path WILL be overwritten.