What this tool does

It splits the JSON data set available from PushShift into smaller JSON files.

At this time, the data can be split by the following keys:

When the data is split, a JSON file is created for each unique key, so if the split is on subreddit, a JSON file is created per subreddit.

Example Usage

The files will be present in ~/tmp/data-by-sub after the above run is complete.

Help

``shell script ~/dev/rust/axe (master) abhijat $ cargo run -- --help Finished dev [unoptimized + debuginfo] target(s) in 0.01s Runningtarget/debug/axe --help` axe 0.1.0 A utility to split a reddit dataset into individual JSON files

USAGE: axe --input-path --output-prefix --split-on

FLAGS: -h, --help Prints help information -V, --version Prints version information

OPTIONS: -i, --input-path The path to the data set -o, --output-prefix The parent directory where output JSON files will be written -s, --split-on The attribute to split the data set on

```