🎼🧬 lightmotif Star me

A lightweight platform-accelerated library for biological motif scanning using position weight matrices.

Actions Coverage License Crate Docs Source Mirror GitHub issues Changelog

🗺️ Overview

Motif scanning with position weight matrices (also known as position-specific scoring matrices) is a robust method for identifying motifs of fixed length inside a biological sequence. They can be used to identify transcription factor binding sites in DNA, or protease cleavage site in polypeptides. Position weight matrices are often viewed as sequence logos:

MX000274.svg

The lightmotif library provides a Rust crate to run very efficient searches for a motif encoded in a position weight matrix. The position scanning combines several techniques to allow high-throughput processing of sequences:

Other crates from the ecosystem provide additional features if needed:

This is the Rust version, there is a Python package available as well.

💡 Example

```rust use lightmotif::*; use lightmotif::abc::Nucleotide; use typenum::U32;

// Create a count matrix from an iterable of motif sequences let counts = CountMatrix::::from_sequences(&[ EncodedSequence::encode("GTTGACCTTATCAAC").unwrap(), EncodedSequence::encode("GTTGATCCAGTCAAC").unwrap(), ]).unwrap();

// Create a PSSM with 0.1 pseudocounts and uniform background frequencies. let pssm = counts.tofreq(0.1).toscoring(None);

/// Create a pipeline to run tasks with platform acceleration let pli = Pipeline::dispatch();

// Use the pipeline to encode the target sequence into a striped matrix let seq = "ATGTCCCAACAACGATACCCCGAGCCCATCGCCGTCATCGGCTCGGCATGCAGATTCCCAGGCG"; let encoded = pli.encode(seq).unwrap(); let mut striped = pli.stripe(encoded);

// Use the pipeline to compute scores for every position of the matrix. striped.configure(&pssm); let scores = pli.score(&striped, &pssm);

// Scores can be extracted into a Vec, or indexed directly. let v = scores.tovec(); asserteq!(scores[0], -23.07094); assert_eq!(v[0], -23.07094);

// The highest scoring position can be searched with a pipeline as well. let best = pli.argmax(&scores).unwrap(); assert_eq!(best, 18);

``` This example uses a dynamic dispatch pipeline, which selects the best available backend (AVX2, SSE2, NEON, or a generic implementation) depending on the local platform.

⏱️ Benchmarks

Both benchmarks use the MX000001 motif from PRODORIC[5], and the complete genome of an Escherichia coli K12 strain. Benchmarks were run on a i7-10710U CPU running @1.10GHz, compiled with --target-cpu=native.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source MIT license.

This project was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References