Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

oleh: Avi Srivastava, Laraib Malik, Tom Smith, Ian Sudbery, Rob Patro

Format: Article
Diterbitkan: BMC 2019-03-01

Deskripsi

Abstract We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.