ONTime

Rust CI Crates.io License: MIT github release version

Extract subsets of ONT (Nanopore) reads based on time

Motivation

Some collaborators wanted to know how long they need to perform sequencing on the Nanopore device until they got "sufficient" data (sufficient is obviously application-dependent).

They were just going to do multiple runs for different amounts of time. So instead, I created ontime to easily grab reads from the first hour, first two hours, first three hours etc. and run those subsets through the analysis pipeline that was the intended application. This way they only needed to do one (longer) run.

Install

tl;dr: precompiled binary

```shell curl -sSL ontime.mbh.sh | sh

or with wget

wget -nv -O - ontime.mbh.sh | sh ```

You can also pass options to the script like so

``` $ curl -sSL ontime.mbh.sh | sh -s -- --help install.sh [option]

Fetch and install the latest version of ontime, if ontime is already installed it will be updated to the latest version.

Options -V, --verbose Enable verbose output for the installer

    -f, -y, --force, --yes
            Skip the confirmation prompt during installation

    -p, --platform
            Override the platform identified by the installer [default: apple-darwin]

    -b, --bin-dir
            Override the bin installation directory [default: /usr/local/bin]

    -a, --arch
            Override the architecture identified by the installer [default: x86_64]

    -B, --base-url
            Override the base URL used for downloading releases [default: https://github.com/mbhall88/ssubmit/releases]

    -h, --help
            Display this help message

```

Conda

Conda (channel only) bioconda version Conda

shell $ conda install -c bioconda ontime

Cargo

shell $ cargo install ontime

Container

Docker images are hosted at [quay.io].

singularity

Prerequisite: singularity

shell $ URI="docker://quay.io/mbhall88/ontime" $ singularity exec "$URI" ontime --help

The above will use the latest version. If you want to specify a version then use a tag (or commit) like so.

shell $ VERSION="0.1.0" $ URI="docker://quay.io/mbhall88/ontime:${VERSION}"

docker

Docker Repository on Quay

Prerequisite: docker

shhell $ docker pull quay.io/mbhall88/ontime $ docker run quay.io/mbhall88/ontime ontime --help

You can find all the available tags on the quay.io repository.

Build from source

shell $ git clone https://github.com/mbhall88/ontime.git $ cd ontime $ cargo build --release $ target/release/ontime -h

Examples

I want the reads that were sequenced in the first hour

shell $ ontime --to 1h in.fq

I want the reads that were sequenced after the first hour

shell $ ontime --from 1h in.fq

I want all reads except those sequenced in the last hour

shell $ ontime --to -1h in.fq

I want reads sequenced between the third and fourth hours

shell ontime --from 3h --to 4h in.fq

Check what the earliest and latest start times in the fastq are

shell $ ontime --show in.fq Earliest: 2022-12-12T15:17:01.0Z Latest : 2022-12-13T01:16:27.0Z

I like to be specific, give me the reads that were sequenced *while I was eating dinner * (see note on time formats)

shell ontime --from 2022-12-12T20:45:00Z --to 2022-12-12T21:17:01.5Z in.fq

I want to save the output to a Gzip-compressed file

```shell $ ontime --to 2h -o out.fq.gz in.fq

```

Usage

``` Usage: ontime [OPTIONS]

Arguments: Input fastq file

Options: -o, --output Output file name [default: stdout] -O, --output-type u: uncompressed; b: Bzip2; g: Gzip; l: Lzma -L, --compress-level <1-9> Compression level to use if compressing output [default: 6] -f, --from Earliest start time; otherwise the earliest time is used -t, --to Latest start time; otherwise the latest time is used -s, --show Show the earliest and latest start times in the input and exit -h, --help Print help information (use --help for more detail) -V, --version Print version information ```

Specifying a time range

The --from and --to options are used to restrict the timeframe you want reads from. These options accept two different formats: duration and timestamp.

Duration: The most human-friendly way to provide a range is with duration. For example, 1h means 1 hour. Passing --from 1h says "I want reads that were generated 1 hour or more after sequencing started" - i.e. the earliest start time in the file plus 1 hour. Likewise, passing --to 2h says "I only want reads that were generated before the second hour of sequencing". Using --from and --to in combination gives you a range.

Negative durations are also allowed. A negative duration subtracts that duration from the latest start time in the file. So --to -1h will exclude reads that were sequenced in the last hour of the run. Negative ranges are also valid - i.e. --from -2h --to -1h will give you the reads sequenced in the penultimate hour of the run.

Timestamp: If you want to provide date and time for your ranges, that is acceptable in --from/--to also. See the formatting guide for more information.

To make using timestamps a little easier, you can first run ontime --show <in.fq> to get the earliest and latest timestamps in the file.

Time format

The times that ontime extracts are the start_time=<time> section contained in the description of each fastq read. The format of this time has changed a few times, so if you come across a file which ontime cannot parse, please raise an issue so I can make it work.

All times printed by ontime and accepted by the --from/--to options are UTC time. More recent versions of Guppy also have UTC offsets in their start_time; for simplicity's sake, these offsets are ignored by ontime. So, if you want to provide a timestamp to --from/--to based on a timeframe in your local time, please first convert it to UTC time.

In general, the timestamp format ontime accepts anything that is RFC339-compliant.

The basic (recommended) format is <YEAR>-<MONTH>-<DAY>T<HOUR>:<MINUTE>:<SECONDS>Z - e.g. 2022-12-12T18:39:09Z. Feel free to get precise with subseconds though if you like...

Full usage

``` Extract subsets of ONT (Nanopore) reads based on time

Usage: ontime [OPTIONS]

Arguments: Input fastq file

Options: -o, --output Output file name [default: stdout]

-O, --output-type u: uncompressed; b: Bzip2; g: Gzip; l: Lzma

      ontime will attempt to infer the output compression format automatically from the output extension. If writing to stdout, the default is uncompressed (u)

-L, --compress-level <1-9> Compression level to use if compressing output

      [default: 6]

-f, --from Earliest start time; otherwise the earliest time is used

      This can be a timestamp - e.g. 2022-11-20T18:00:00 - or a duration from the start - e.g. 2h30m (2 hours and 30 minutes from the start). See the docs for more examples

-t, --to Latest start time; otherwise the latest time is used

      See --from (and docs) for examples

-s, --show Show the earliest and latest start times in the input and exit

-h, --help Print help information (use -h for a summary)

-V, --version Print version information ```

Cite