A simple CLI utility to convert a GTF file to NDJSON for fast parsing and perform other functionalities on those jsons.
The GTF file format is fantastic when working with bedtools since it is essentially
a modified version of the BED file format.
However, if you're interested in the annotations column, it can be a massive headache to parse - especially if you're operating on the full genome.
I wrote this tool to convert the GTF file format into streamable newline-delim JSON.
This makes it convenient to load with polars in python incredibly fast and skip
all the annotation parsing.
You can install this with the rust package manager cargo:
bash
cargo install gtfjson
The executable of this tool is gj.
To convert GTF file formats to NDJSON we can use the convert subcommand
``` bash
gj convert -i
gj convert -i
We can also use gj to partition a gtf-json in different ways.
It takes a variable in the attributes and creates a new file for each category of that record and populates those files with the records matching that category.
For example - we can write the GTF of every gene to a separate file:
``` bash
gj partition -i
gj partition -i
gj partition -i