Loading...

Feature Extraction for Protein Sequences Based on NMR Spectra and Its Application in the Protein Interaction Prediction

Teimoori, Bahareh | 2017

758 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: English
  3. Document No: 49524 (55)
  4. University: Sharif University of Technology, International Campus, Kish Island
  5. Department: Science and Engineering
  6. Advisor(s): Hajsadeghy, Khosro; Kavousi, Kaveh
  7. Abstract:
  8. Nuclear magnetic resonance is a spectroscopic method which is used to investigate characteristics of molecules with hydrogen and carbon chains. In this thesis we used, NMR spectrum extracted from 19 types of amino acids for investigating on feature generation for protein sequences. We processed NMR spectra based on Hydrogen and Carbon atoms in structure of the amino acids and after preprocessing we extracted features for each amino acid from the spectra. After that, we tried to cluster the amino acids with Fuzzy Clustering Method (FCM) then we generated feature vectors by extracting special descriptor for amino acids in sequence of proteins. In addition to NMR, we used the features of amino acids obtained from physicochemical properties such as hydrophobicity, polarity, PSSM… By means of Genetic Algorithm (GA), the best features have been selected. Subsequently, we used Support Vector Machine (SVM) with RBF kernel as a classifier in Protein folding classifications and protein-protein interaction prediction.In protein folding prediction, the NMR independently had %31 over Hydrogen NMR data and about %33 over Carbon NMR data that this accuracy is not significantly high but in our data set there is some special proteins which their fold can be correctly predicted by just NMR data, it means that NMR can be useful for fold prediction in specific kind of protein.
    At last, we use Sugeno fuzzy integral as the fusion method for fusing the results of different subspaces. We see that by fusing other feature subspaces with Hydrogen NMR data the accuracy is improved to %36.55 and %35.50 with the Carbon NMR data. Finally, we added the most informative subspace which were 'PSSM' based features and the accuracy is improved to %54.04. In protein–protein interaction prediction with 10-fold cross validation, NMR gives a high accuracy about %85 for NMR data of Hydrogen and Carbon. As a result, we can claim that the NMR data is well-informative and applicable for protein-protein interaction. In this project, we concluded that NMR data are more applicable in prediction of protein fold for specific proteins that just can be truly classified by NMR data and it can be used independently for protein-protein interaction prediction with high accuracy
  9. Keywords:
  10. Nuclear Magnetic Resonance ; Support Vector Machine (SVM) ; Fuzzy Integral ; Protein-Protein Interaction ; Forecasting ; Protein Folding

 Digital Object List

 Bookmark

No TOC