Loading...
Providing a Tool for Predicting the Tertiary Structure of Proteins using Neural Networks and Coding the Sequence of Proteins based on GPU
Fereidoon, Mohammad Amin | 2024
0
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 57564 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Koohi, Somayyeh
- Abstract:
- For more than five decades, the task of accurately determining the three-dimensional arrangement of a protein based merely on its sequence of amino acids has remained a prominent and unsolved area of research. Three-dimensional structure prediction methods are essential due to the time-consuming nature, expensive equipment requirements, and high expenses associated with establishing the structure of each protein using traditional laboratory procedures. When trying to figure out the connection between sequences of known structures and sequences of unknown structures, deep learning algorithms are a faster and less expensive alternative to experiments. Although there have been recent advancements, there is still potential for enhancing the precision and efficiency of current procedures. AlphaFold, created by DeepMind, has successfully predicted protein structures with a level of accuracy comparable to experimental results using amino acid sequences. The AlphaFold prediction process consists of two distinct steps: The tasks involve creating multiple sequence alignments (MSAs) using central processing units (CPUs) and inferring models using graphics processing units (GPUs). During the initial phase, AlphaFold exclusively utilizes CPUs, which can take several hours to generate the Multiple Sequence Alignment (MSA) for a single protein. This is mainly due to the large databases and limitations in input/output processing. As a result, the GPUs are not being used efficiently during this step, reducing their utilization and restricting the ability to make large-scale structural predictions. This thesis presents a model that effectively decreases the computational time required for predicting the tertiary structure of a set of proteins while still ensuring the accuracy of the prediction results. In this thesis, the MMseqs2 tool is used to generate the MSA. Furthermore, the improved JAX compilation greatly reduces the inference portion's execution time, maximizing GPU utilization.
- Keywords:
- Protein Structure ; Deep Learning ; Alphafold Model ; Multiple Sequence Aliqument ; Graphics Procssing Unit (GPU) ; Central Processing Unit
-
محتواي کتاب
- view
- چکیده
- فهرست جدولها
- فهرست تصویرها
- فهرست نمودارها
- فصل1 مقدمه
- فصل2 کارهای پیشین
- 2-1 مدلسازی مبتنی بر الگو
- 2-2 مدلسازی بدون الگو
- 2-3 پیشبینی ساختار پروتئین بر اساس نقشه تماس
- 2-4 پیشبینی ساختار پروتئین بر اساس پیشبینی مبتنی بر فاصله
- 2-5 پیشبینی ساختار پروتئین بهصورت انتها به انتها
- 2-6 پیشبینی ساختار پروتئین مبتنی بر مدلهای زبانی پروتئین
- 2-7 پیشبینی ساختار پروتئینهای چند دامنهای
- 2-8 جمعبندی کارهای پیشین
- فصل3 روش پیشنهادی
- فصل4 پیادهسازی و نتایج
- 4-1 معیارهای ارزیابی استفاده شده
- 4-2 مجموعهدادههای استفاده شده برای ارزیابی
- 4-3 مقایسه زمان ساخت فایل feature.pkl توسط دو روش آلفافولد و MMseqs2
- 4-4 مقایسه زمان اجرای فرایند استنتاج با اجرای بهینهسازی کامپایل JAX و بدون اجرای بهینهسازی کامپایل JAX
- 4-5 مقایسه دقت مدلهای پیشبینی شده نسبت به ساختار مرجع
- فصل5 جمعبندی و نتیجهگیری
- منابع یا مراجع