A simple CLI utility to convert a GTF file to NDJSON for fast parsing and perform other functionalities on those jsons.
The GTF file format is fantastic when working with bedtools
since it is essentially
a modified version of the BED
file format.
However, if you're interested in the annotations column, it can be a massive headache to parse - especially if you're operating on the full genome.
I wrote this tool to convert the GTF file format into streamable newline-delim JSON.
This makes it convenient to load with polars
in python incredibly fast and skip
all the annotation parsing.
You can install this with the rust package manager cargo
:
bash
cargo install gtfjson
The executable of this tool is gj
.
To convert GTF file formats to NDJSON we can use the convert
subcommand
``` bash
gj convert -i
gj convert -i
We can also use gj
to partition a gtf-json in different ways.
It takes a variable in the attributes and creates a new file for each category of that record and populates those files with the records matching that category.
For example - we can write the GTF of every gene to a separate file:
``` bash
gj partition -i
gj partition -i
gj partition -i