A novel pattern matching algorithm for genomic patterns related to protein motifs

Foroughmand Araabi, M. H ; Sharif University of Technology | 2020

289 Viewed
  1. Type of Document: Article
  2. DOI: 10.1142/S0219720020500110
  3. Publisher: World Scientific Publishing Co. Pte Ltd , 2020
  4. Abstract:
  5. Patterns on proteins and genomic sequences are vastly analyzed, extracted and collected in databases. Although protein patterns originate from genomic coding regions, very few works have directly or indirectly dealt with coding region patterns induced from protein patterns. Results: In this paper, we have defined a new genomic pattern structure suitable for representing induced patterns from proteins. The provided pattern structure, which is called "Consecutive Positions Scoring Matrix (CPSSM)", is a replacement for protein patterns and profiles in the genomic context. CPSSMs can be identified, discovered, and searched in genomes. Then, we have presented a novel pattern matching algorithm between the defined genomic pattern and genomic sequences based on dynamic programming. In addition, we have modified the provided algorithm to support intronic gaps and huge sequences. We have implemented and tested the provided algorithm on real data. The results on Saccharomyces cerevisiae's genome show 132% more true positives and no false negatives and the results on human genome show no false negatives and 10 times as many true positives as those in previous works. Conclusion: CPSSM and provided methods could be used for open reading frame detection and gene finding. The application is available with source codes to run and download at http://app.foroughmand.ir/cpssm/. © 2020 World Scientific Publishing Europe Ltd
  6. Keywords:
  7. Bioinformatics algorithm ; Bioinformatics service ; Dynamic programming ; Genomic sequence ; Pattern matching ; Algorithm ; Article ; Bioinformatics ; False negative result ; Human ; Human genome ; Intron ; Nonhuman ; Open reading frame ; Protein analysis ; Protein motif ; Saccharomyces cerevisiae
  8. Source: Journal of Bioinformatics and Computational Biology ; Volume 18, Issue 1 , 2020
  9. URL: https://www.worldscientific.com/doi/abs/10.1142/S0219720020500110