Smafa attempts to align or cluster pre-aligned biological sequences, handling sequences which are all the same length. The main use case is through SingleM, although it can be used independently without issue to search and cluster other pre-aligned sequences.
A statically linked executable is available at the releases page. Given you are running x86-64 Linux, it should be possible to download and run smafa directly.
Smafa can be installed in the usual way Rust packages are installed. After installing Rust, smafa can be installed using cargo:
cargo install smafa
To run the aligner, first make a db with smafa makedb
and then query that
database with smafa query
. To see how to use these modes, use e.g. smafa
makedb -h
.
To run the in clustering mode, use smafa cluster
. Note that the clustering
mode implements a greedy algorithm, where sequences encountered earlier in the
input file are taken as cluster representatives, unless they are sufficiently
similar to, i.e. cluster with, a previously encountered sequence.
If you have any questions or comments, send a message to the SupportM mailing list or raise a GitHib issue.
To run benchmarks, rust nightly is required. Then run:
rustup run nightly cargo bench --features unstable
Woodcroft, B. Community diversity in metagenomes: one, many and thousands. Winter School in Mathematical and Computational Biology. http://bioinformatics.org.au/ws16/speaker-items/ben-woodcroft/#tab-f2e1404e449518dbab9
Smafa is written by BenWoodcroft (@wwood) at the Australian Centre for Ecogenomics (UQ) and is licensed under GPL3 or later.