Loading...

SVNN: an efficient PacBio-specific pipeline for structural variations calling using neural networks

Akbarinejad, S ; Sharif University of Technology | 2021

353 Viewed
  1. Type of Document: Article
  2. DOI: 10.1186/s12859-021-04184-7
  3. Publisher: BioMed Central Ltd , 2021
  4. Abstract:
  5. Background: Once aligned, long-reads can be a useful source of information to identify the type and position of structural variations. However, due to the high sequencing error of long reads, long-read structural variation detection methods are far from precise in low-coverage cases. To be accurate, they need to use high-coverage data, which in turn, results in an extremely time-consuming pipeline, especially in the alignment phase. Therefore, it is of utmost importance to have a structural variation calling pipeline which is both fast and precise for low-coverage data. Results: In this paper, we present SVNN, a fast yet accurate, structural variation calling pipeline for PacBio long-reads that takes raw reads as the input and detects structural variants of size larger than 50 bp. Our pipeline utilizes state-of-the-art long-read aligners, namely NGMLR and Minimap2, and structural variation callers, videlicet Sniffle and SVIM. We found that by using a neural network, we can extract features from Minimap2 output to detect a subset of reads that provide useful information for structural variation detection. By only mapping this subset with NGMLR, which is far slower than Minimap2 but better serves downstream structural variation detection, we can increase the sensitivity in an efficient way. As a result of using multiple tools intelligently, SVNN achieves up to 20 percentage points of sensitivity improvement in comparison with state-of-the-art methods and is three times faster than a naive combination of state-of-the-art tools to achieve almost the same accuracy. Conclusion: Since prohibitive costs of using high-coverage data have impeded long-read applications, with SVNN, we provide the users with a much faster structural variation detection platform for PacBio reads with high precision and sensitivity in low-coverage scenarios. © 2021, The Author(s)
  6. Keywords:
  7. Pipelines ; Sensitivity analysis ; Detection methods ; Multiple tools ; Percentage points ; Sensitivity improvements ; Sequencing errors ; State of the art ; State-of-the-art methods ; Structural variations ; Neural networks ; Pipeline ; Diagnostic test ; DNA sequence ; High throughput sequencing ; Diagnostic Tests, Routine ; High-Throughput Nucleotide Sequencing ; Neural Networks, Computer ; Sequence Analysis, DNA ; Software
  8. Source: BMC Bioinformatics ; Volume 22, Issue 1 , 2021 ; 14712105 (ISSN)
  9. URL: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04184-7