protein-translate

Build Status Cargo Documentation

Translate nucleotide sequence (dna or rna) to protein.

Usage

Add this to your Cargo.toml:

toml [dependencies] protein-translate = "0.2.0"

Example

```rust use protein_translate::translate;

fn main() { let dna = b"GTGAGTCGTTGAGTCTGATTGCGTATC"; let protein = translate(dna); assert_eq!("VSRVLRI", &protein);

// To shift reading frame
let protein_frame2 = translate(&dna[1..]);
assert_eq!("*VVESDCV", &protein_frame2);

} ```

Benchmarks

The current algorithm is inspired by seqan's implementation which uses array indexing. Here is how it performs vs other methods (tested on 2012 macbook pro).

| Method | 10 bp* | 100 bp | 1,000 bp | 10,000 bp | 100,000 bp | 1 million bp | | ------ | ---- | ----- | ------- | -------- | --------- | ------- | | proteintranslate | 91 ns | 0.29 μs | 2.28 μs | 23 μs | 215 μs | 2.25 ms | | fnv hashmap | 111 ns | 0.37 μs | 3.58 μs | 37 μs | 366 us | 3.86 ms | | std hashmap | 160 ns | 1.03 μs | 9.65 μs | 100 μs | 943 μs | 9.40 ms | | phfmap | 177 ns | 1.04 μs | 9.47 μs | 100 μs | 936 μs | 9.91 | | match statement | 259 ns | 1.77 μs | 17.9 μs | 163 μs | 1941 μs | 19.1 ms | | protein_translate (unchecked) | 90 ns | 0.26 μs | 2.02 μs | 20 μs | 197 μs | 1.92 ms |

*bp = "base pairs"

To benchmark yourself (have to use nightly because of phf_map macro).

cargo +nightly bench

Thoughts

Todo

Tests

To test

cargo test

To can also generate new test data (requires python3 and biopython).

```bash

Generate 500 random sequences and their peptides

python3 tests/generatetestdata.py 500 ```