headj

A utility that converts input JSON arrays into valid JSON that contains only a subset of the elements

crates.io docs.rs License Latest version All downloads Downloads of latest version

Description

A utility to take enormous JSON files & cut them down to size.

Sometimes one has a JSON file with a very, very large array in it, but you really would like to have a subset of the data. For example, you have a JSON file representing a DB dump of millions of records & you'd like to have a workable number of rows to test with, or code with, or just examine.

You could use an editor, but even if the editor can load a file that large, it's likely to be very unpleasant to use.

You could just use some kind of text processor (like the unix/linux head command) but there are two issues with that:

  1. If the file doesn't have newlines, that's not helpful.
  2. The text processor will know nothing about JSON, so it will mangle it.

You could write a script/program to do the work for you. I did that. Then I decided to package it up, so you don't have to.

headj is a command line utility, similar to the head command, for producing a subset of a JSON file that is itself valid JSON. It allows you to ingest JSON containing a huge JSON array I produce JSON with a manageable JSON array.

THIS IS ALPHA LEVEL SOFTWARE

It appears to work, but I'm working on it.

One very large caveat is: ~~It freely discards JSON that surrounds the array of interest. So, if you have a complex JSON object with a huge array in it, you will get a JSON file back that only includes the (reduced) array & whatever JSON structure it was found in. Everything else will be elided.~~ (This actually works correctly now)

For example:

Input

json { "a": 1, "b": [ 1, 2, 3, 4, 5 ], "c": true }

command: headj --key 'b' --count 3

output

json { "b": [ 1, 2, 3 ] }

Installation

With Cargo

shell cargo install headj

Usage

``` USAGE: headj [OPTIONS] [INPUT_FILE]

ARGS: The JSON file to read from. If none is specified, reads from Standard Input

OPTIONS: -c, --count Number of elements to copy to the output (default: 100) [default: 100] -d, --debug Activate extra debugging output -f, --format-output Nicely format the output JSON with indentation & newlines -h, --help Print help information -k, --key The JSON key of the array to copy from. If none specified, treat the input JSON as an array -n, --no-context Output only the target JSON array -o, --out-file File to write the JSON results to (default: Standard Output) -q, --quiet Don't print any status, diagnostic or error messages -s, --skip Number of elements to skip before copying (default: 0) [default: 0] -V, --version Print version information ```

Examples

```shell headj <<- JSON [1,2,3,4,5] JSON

Output: [1, 2, 3, 4, 5]

headj -c 1 <<- JSON [1,2,3,4,5] JSON

Output: [1]

headj -c 1 -s 2 <<- JSON [1,2,3,4,5] JSON

Output: [3]

headj -c 2 -s 2 <<- JSON [1,2,3,4,5] JSON

Output: [3, 4]

headj -k 'foo' <<- JSON {"foo":[1,2,3,4,5]} JSON

Output: {"foo": [1, 2, 3, 4, 5]}

headj -k 'foo' -n <<- JSON {"foo":[1,2,3,4,5]} JSON

Output: [1, 2, 3, 4, 5]

headj -c 2 -s 2 <<- JSON [1,2,3,4,5] JSON

Output: [3, 4]

headj -c 25 -s 2 <<- JSON [1,2,3,4,5] JSON

Output: [3, 4, 5]

headj -c 2 -s 2 <<- JSON [1,2,3,4,5] JSON

Output: [3, 4]

headj -c 2 -s 2 -f <<- JSON [1,2,3,4,5] JSON

Output: [\n 3,\n 4\n]

headj -k 'foo.bar' -c 2 -s 2 -n <<- JSON {"foo":{"bar":[1,2,3,4,5]}} JSON

Output: [3, 4]

headj -k 'foo.bar' -c 2 -s 2 <<- JSON {"foo":{ "bar":[1,2,3,4,5]} } JSON

Output: {"bar": {"foo": [3, 4]}}

headj -k 'foo' -c 2 -s 2 <<- JSON {"foo":[1,2,3,4,5]} JSON

Output: {"foo": [3, 4]}

headj -k 'foo' -c 2 -s 2 -n <<- JSON {"foo":[1,2,3,4,5]} JSON

Output: [3, 4]

```

Documentation

The TBD

License

MIT