This crate provides a Rust interface to the Edlib C++ library by Martin Šošić. See Martinsos-edlib
The reference paper is :
Martin Šošić, Mile Šikić; Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance. Bioinformatics 2017 [btw753. doi] https://doi.org/10.1093/bioinformatics/btw753
The crate offers 2 interfaces to edlib.
The first, accessed via module bindings, is direcly the interface generated by the bindgen crate.
The second, accessed via module edlibrs, provides a more idiomatic Rust interface. It comes at the cost of cloning information stored in pointers startLocations and endLocations in C struct EdlibAlignResult to get a Rust struct EdlibAlignResultRs with Option
As a consequence memory management is fully transferred to Rust.
Structures and functions have the same name as in edlib with just "Rs" appended to original names.
For the edlibrs interface we have for example:
in normal mode:
rust
use edlib_rs::edlibrs::*;
...
let query = "ACCTCTG";
let target = "ACTCTGAAA";
let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &EdlibAlignConfigRs::default());
assert_eq!(align_res.status, EDLIB_STATUS_OK);
assert_eq!(align_res.editDistance, 4);
in the infix mode :
rust
use edlib_rs::edlibrs::*;
...
let query = "ACCTCTG";
let target = "TTTTTTTTTTTTTTTTTTTTTACTCTGAAA";
//
let mut config = EdlibAlignConfigRs::default();
config.mode = EdlibAlignModeRs::EDLIB_MODE_HW;
let align_res = edlibAlignRs(query.as_bytes(), target.as_bytes(), &config);
assert_eq!(align_res.editDistance, 1);
The crate relies on the C++ edlib library being installed and compiled as described in edlib documentation.
Before running cargo build (or cargo install) the environment variable EDLIBDIR must be set to where the original C++ edlib directory was cloned. This is necessary for the build.rs step of Cargo to access the edlib library includes.
Also libstdc++ must be in your path.
The crate enables a logger to monitor the call to the C-interface which is by default set in Cargo.toml to *info* for release mode and *trace* for debug mode, but can changed by setting the variable RUSTLOG (see env_logger doc).
Some tests in module edlib.rs can serve as basic examples. Please note that cargo test must be run with variable EDLIBDIR set. In directory examples there is also a small version of the edlib edaligner module (see apps/aligner in edlib installation dir) which runs on Fasta files containing only one sequence as contained in the edlib directory *testdata*. Contrary to the edlib version the module given a query and a target sequence runs the 3 modes (normal/NW, prefix/SHW and infix/HW) in one pass.
With RUST_LOG=info ./target/release/examples/edaligner --dirdata "$edlibpath/test_data/Enterobacteria_Phage_1" --tf "Enterobacteria_phage_1.fasta" --qf "mutated_90_perc.fasta"
we get the following timing in release mode for Enterobacteriaphage1.fasta as target sequence and mutated90perc.fasta as query sequence.
| mode | edlibrs time(s) | edlib time(s) | distance | | :---: | :---: | :------: | :----: | | NW | 0.106 | 0.106 | 9506 | | SHW | 0.184 | 0.191 | 9502 | | HW | 0.682 | 0.695 | 9502 |
We get the following timing in release mode for Enterobacteriaphage1.fasta as target sequence and mutated60perc.fasta as query sequence.
| mode | edlibrs time(s) | edlib time(s) | distance | | :---: | :---: | :------: | :----: | | NW | 0.398 | 0.398 | 39829 | | SHW | 0.670 | 0.684 | 39828 | | HW | 1.182 | 1.206 | 39828 |
Except for infinitesimal variations of cpu time measurement we see we have the same computation times.
Licensed under either of
at your option.
This software was written on my own while working at CEA, CEA-LIST