lightmotif
A lightweight platform-accelerated library for biological motif scanning using position weight matrices.
Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:
The lightmotif
library provides a Rust crate to run very efficient
searches for a motif encoded in a position weight matrix. The position
scanning combines several techniques to allow high-throughput processing
of sequences:
permute
instructions of AVX2.```rust use lightmotif::*;
// Create a count matrix from an iterable of motif sequences
let counts = CountMatrix:: // Create a PSSM with 0.1 pseudocounts and uniform background frequencies.
let pssm = counts.tofreq(0.1).toscoring(None); // Encode the target sequence into a striped matrix
let seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG";
let encoded = EncodedSequence:: // Use a pipeline to compute scores for every position of the matrix
let scores = Pipeline:: // Scores can be extracted into a Vec To use the AVX2 implementation, simply create a Both benchmarks use the MX000001
motif from PRODORIC[4], and the
complete genome of an
Escherichia coli K12 strain.
Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with Score every position of the genome with the motif weight matrix:
Find the highest-scoring position for a motif in a 10kb sequence
(compared to the PSSM algorithm implemented in
Found a bug ? Have an enhancement request ? Head over to the GitHub issue
tracker if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation. This project adheres to Semantic Versioning
and provides a changelog
in the Keep a Changelog format. This library is provided under the open-source
MIT license. This project was developed by Martin Larralde
during his PhD project at the European Molecular Biology Laboratory
in the Zeller team.Pipeline<_, __m256>
instead
of the Pipeline<_, f32>
. This is only supported when the library is compiled
with the avx2
target feature, but it can be easily configured with Rust's
#[cfg]
attribute.⏱️ Benchmarks
--target-cpu=native
.
console
running 3 tests
test bench_avx2 ... bench: 13,053,752 ns/iter (+/- 45,411) = 355 MB/s
test bench_ssse3 ... bench: 37,203,277 ns/iter (+/- 2,416,572) = 124 MB/s
test bench_generic ... bench: 314,682,807 ns/iter (+/- 1,072,174) = 14 MB/s
bio::pattern_matching::pssm
):
console
test bench_avx2 ... bench: 46,390 ns/iter (+/- 115) = 215 MB/s
test bench_ssse3 ... bench: 97,691 ns/iter (+/- 2,720) = 102 MB/s
test bench_generic ... bench: 740,305 ns/iter (+/- 2,527) = 13 MB/s
test bench_bio ... bench: 1,575,504 ns/iter (+/- 2,799) = 6 MB/s
💭 Feedback
⚠️ Issue Tracker
📋 Changelog
⚖️ License
📚 References