alevin-fry
is a suite of tools for the rapid, accurate and memory-frugal processing single-cell and single-nucleus sequencing data. It consumes RAD files generated by salmon alevin
, and performs common operations like generating permit lists, and estimating the number of distinct molecules from each gene within each cell. The focus in alevin-fry
is on safety, accuracy and efficiency (in terms of both time and memory usage).
You can read the pre-print describing alevin-fry : "Alevin-fry unlocks rapid, accurate, and memory-frugal quantification of single-cell RNA-seq data" on bioRxiv.
Relationship to alevin: Alevin-fry has been designed as the successor to alevin. It subsumes the core features of alevin, while also providing important new capabilities and considerably improving the performance profile. We anticipate that new method development and feature additions will take place primarily within the alevin-fry codebase. Thus, we encourage users of alevin to migrate to alevin-fry when feasible. That being said, alevin is still actively-maintained and supported, so if you are using it and not ready to migrate you can continue to ask questions and post issues in the salmon repository.
Alevin-fry is under active development. However, you can find the documentation on read the docs. We try to keep the documentation up to date with the latest developments in the software. Additionally, there is a series of tutorial for using alevin-fry for processing different types of data that you can find here.
The generation of the reduced alignment data (RAD) files processed by alevin-fry is done by salmon. The latest version of salmon is available on GitHub, via bioconda, and on dockerhub.
The usefulaf
repository contains scripts in functions that are useful in helping to prepare input for alevin-fry processing, importing alevin-fry output into downstream analysis evnironemnts, and even running common configurations of alevin-fry more simply.
Alevin-fry is available for both x86 linux and OSX platforms using bioconda.
With bioconda
in the appropriate place in your channel list, you should simply be able to install via:
{bash}
$ conda install alevin-fry
If you want to use features or fixes that may only be available in the latest develop branch (or want to build for a different
architecture), then you have to build from source. Luckily, cargo
makes that easy; see below.
Alevin-fry is built and tested with the latest (major & minor) stable version of Rust. While it will likely compile fine with slightly older versions of Rust, this is not a guarantee and is not a support priority. Unlike with C++, Rust has a frequent and stable release cadence, is designed to be installed and updated from user space, and is easy to keep up to date with rustup. Thanks to cargo, building should be as easy as:
{bash}
$ cargo build --release
subsequent commands below will assume that the executable is in your path. Temporarily, this can be done (in bash-like shells) using:
{bash}
$ export PATH=`pwd`/target/release/:$PATH
In the manuscript describing alevin-fry, we primarily make use of an index that is built over spliced + intron sequence, which we refer to as a splici reference. To make the construction of the relevant reference sequence (and the 3 column TSV file you will need for Unspliced/Spliced/Ambiguous (USA) quantification) simple, we have written an R script that will process a genome and GTF file and produce the splici reference which you can then index with salmon
as normal.
First, checkout the usefulaf
repository and navigate to the R
directory. Then, we'll run the
build_splici_ref.R
script.
$ ./build_splici_ref.R <path_to_genome_fasta> <path_to_gtf> <target_read_length> <output_dir>
where $
indicated your command prompt. In addition to these required positional arguments, there are a few optional arguments that you can find by running
$ ./build_splici_ref.R -h
After you have run this script, your output directory should contain 3 files:
<output_dir>/transcriptome_splici_fl<target_read_length-5>.fa
<output_dir>/transcriptome_splici_fl<target_read_length-5>_t2g.tsv
<output_dir>/transcriptome_splici_fl<target_read_length-5>_t2g_3col.tsv
The first file contains the splici reference sequence that you should index with salmon
, and the third contains the 3-column transcript-to-gene mapping
that you should pass to alevin-fry
during the quant
phase.
If you have any questions about preparing the splici reference, or otherwise about processing your data with alevin-fry
please feel free to open an issue
here on GitHub!