🎼🧬 lightmotif Star me

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

Actions Coverage License Crate Source Mirror GitHub issues Changelog

🗺️ Overview

Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:

MX000274.svg

The lightmotif library provides a Rust crate to run very efficient searches for a motif encoded in a position weight matrix. The position scanning combines several techniques to allow high-throughput processing of sequences:

💡 Example

```rust use lightmotif::*;

// Create a count matrix from an iterable of motif sequences let counts = CountMatrix::

// Create a PSSM with 0.1 pseudocounts and uniform background frequencies. let pssm = counts.tofreq(0.1).toscoring(None);

// Encode the target sequence into a striped matrix let seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG"; let encoded = EncodedSequence::::encode(seq).unwrap(); let mut striped = encoded.to_striped::<32>(); striped.configure(&pssm);

// Use a pipeline to compute scores for every position of the matrix let scores = Pipeline::::score(&striped, &pssm);

// Scores can be extracted into a Vec, or indexed directly. let v = scores.tovec(); asserteq!(scores[0], -23.07094); assert_eq!(v[0], -23.07094); ```

To use the AVX2 implementation, simply create a Pipeline<_, __m256> instead of the Pipeline<_, f32>. This is only supported when the library is compiled with the avx2 target feature, but it can be easily configured with Rust's #[cfg] attribute.

⏱️ Benchmarks

Both benchmarks use the MX000001 motif from PRODORIC[4], and the complete genome of an Escherichia coli K12 strain. Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with --target-cpu=native.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References