bamrescue is a small command line utility to check Binary Sequence Alignment / Map (BAM) files for corruption and repair them.
A BAM file is a BGZF file (specification), and as such is composed of a series of concatenated RFC1592-compliant gzip blocks (specification).
Each gzip block contains at most 64 KiB of data, including a CRC16 checksum of the gzip header and a CRC32 checksum of the gzip data which are used to check data integrity.
Additionally, since gzip blocks start with a gzip identifier (ie. 0x1f8b), it is possible to skip over corrupted blocks (at most 64 KiB) to the next non-corrupted block with limited complexity and acceptable reliability.
This property is used to repair corrupted BAM files by keeping only their non-corrupted blocks, hopefully rescuing most reads.
Run cargo build --release
in your working copy.
Copy the bamrescue
binary wherever you want.
Run bamrescue <bamfile_to_check_or_repair> <output_bamfile>
.
Contributions are welcome through GitHub pull requests.
Please report bugs and feature requests on GitHub issues.
bamrescue is copyright (C) 2017 Jérémie Roquet jroquet@arkanosis.net and licensed under the ISC license.