Providing a Tool for Predicting the Tertiary Structure of Proteins using Neural Networks and Coding the Sequence of Proteins based on GPU

Fereidoon, Mohammad Amin; Koohi, Somayyeh

Please enable javascript in your browser.

Providing a Tool for Predicting the Tertiary Structure of Proteins using Neural Networks and Coding the Sequence of Proteins based on GPU

Fereidoon, Mohammad Amin | 2024

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 57564 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Koohi, Somayyeh
Abstract:
For more than five decades, the task of accurately determining the three-dimensional arrangement of a protein based merely on its sequence of amino acids has remained a prominent and unsolved area of research. Three-dimensional structure prediction methods are essential due to the time-consuming nature, expensive equipment requirements, and high expenses associated with establishing the structure of each protein using traditional laboratory procedures. When trying to figure out the connection between sequences of known structures and sequences of unknown structures, deep learning algorithms are a faster and less expensive alternative to experiments. Although there have been recent advancements, there is still potential for enhancing the precision and efficiency of current procedures. AlphaFold, created by DeepMind, has successfully predicted protein structures with a level of accuracy comparable to experimental results using amino acid sequences. The AlphaFold prediction process consists of two distinct steps: The tasks involve creating multiple sequence alignments (MSAs) using central processing units (CPUs) and inferring models using graphics processing units (GPUs). During the initial phase, AlphaFold exclusively utilizes CPUs, which can take several hours to generate the Multiple Sequence Alignment (MSA) for a single protein. This is mainly due to the large databases and limitations in input/output processing. As a result, the GPUs are not being used efficiently during this step, reducing their utilization and restricting the ability to make large-scale structural predictions. This thesis presents a model that effectively decreases the computational time required for predicting the tertiary structure of a set of proteins while still ensuring the accuracy of the prediction results. In this thesis, the MMseqs2 tool is used to generate the MSA. Furthermore, the improved JAX compilation greatly reduces the inference portion's execution time, maximizing GPU utilization.
Keywords:
Protein Structure ; Deep Learning ; Alphafold Model ; Multiple Sequence Aliqument ; Graphics Procssing Unit (GPU) ; Central Processing Unit

Digital Object List

محتواي کتاب
view

Bookmark

Friend's email
Your name
Your email
enter code