Loading...

Machine Learning Approaches for the Prediction of Pathogenicity in Genome Variations

Sahebi, Alireza | 2023

106 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 56243 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sharifi Zarchi, Ali; Asgari, Ehsannedin
  7. Abstract:
  8. Genome mutations whose effects are not specified pose one of the challenges in identifying genetic diseases. Utilizing wet lab tests to detect the pathogenicity of variants can be time-consuming and fiscally expensive. A rapid and cost-effective solution to this problem is the use of machine learning-based variant effect predictors, which have the ability to determine whether a mutation is pathogenic or not. The objective of this research is to predict the pathogenicity of genome variations. The proposed model exclusively utilizes the protein sequence as its input feature and does not have access to other protein features. The data used to construct the model comprises mutations with specific significance, which are obtainable from public variation databases. To build such a predictor, we employ distributed representations of proteins extracted from ProtBert and ESM2 protein language models, as well as the AlphaFold2 protein structure predictor. Extracting embedding from AlphaFold2 is a time-consuming process, as it relies on multiple sequence alignments requiring queries to large sequence databases. We introduce new approaches that not only enhance the speed of running AlphaFold2 on the mutated sequences but also improve the representational capacity for predicting the effect of variants. Additionally, we evaluate and optimize various types of neural network classifiers, including fully connected, convolutional, and multi-head attention neural networks and report the best-performing model
  9. Keywords:
  10. Machine Learning ; Deep Learning ; Genome Variations Classification ; Biological Sequence Processing ; Variant Pathogenicity Prediction ; Genetic Mutation

 Digital Object List

 Bookmark

...see more