Loading...

IMOS: improved meta-aligner and minimap2 on spark

Hadadian Nejad Yousefi, M ; Sharif University of Technology | 2019

989 Viewed
  1. Type of Document: Article
  2. DOI: 10.1186/s12859-018-2592-5
  3. Publisher: BioMed Central Ltd , 2019
  4. Abstract:
  5. Background: Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed. Results: In this paper, we presented IMOS, an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM) enhancing both its accuracy and speed. IM is up to 6x faster than the original Meta-aligner. It is also implemented to run IM and Minimap2 on Apache Spark for deploying on a cluster of nodes. Moreover, multi-node IMOS is faster than SparkBWA while executing both IM (1.5x) and Minimap2 (25x). Conclusion: In this paper, we purposed an architecture for mapping long reads to a reference. Due to its implementation, IMOS speed can increase almost linearly with respect to the number of nodes in a cluster. Also, it is a multi-platform application able to operate on Linux, Windows, and macOS. © 2019 The Author(s)
  6. Keywords:
  7. Long read ; PacBio ; Big data ; Computer operating systems ; Data handling ; Distributed computer systems ; Mapping ; Aligner ; Cluster of nodes ; Distributed clusters ; Distributed nodes ; Distributed processing ; Multi-platform applications ; Genes ; Algorithm ; Biology ; Chromosomal mapping ; DNA sequence ; Factual database ; Genetic database ; Human ; Human genome ; Algorithms ; Chromosome Mapping ; Computational Biology ; Databases, Factual ; Databases, Genetic ; Genome, Human ; Genomics ; Humans ; Sequence Alignment ; Sequence Analysis, DNA ; Software ; Workflow
  8. Source: BMC Bioinformatics ; Volume 20, Issue 1 , 2019 ; 14712105 (ISSN)
  9. URL: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2592-5