Loading...

TripletProt: Deep Representation Learning of Proteins Based On Siamese Networks

Nourani, E ; Sharif University of Technology | 2022

112 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/TCBB.2021.3108718
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2022
  4. Abstract:
  5. Pretrained representations have recently gained attention in various machine learning applications. Nonetheless, the high computational costs associated with training these models have motivated alternative approaches for representation learning. Herein we introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. Representation learning of biological entities which capture essential features can alleviate many of the challenges associated with supervised learning in bioinformatics. The most important distinction of our proposed method is relying on the protein-protein interaction (PPI) network. The computational cost of the generated representations for any potential application is significantly lower than comparable methods since the length of the representations is significantly smaller than that in other approaches. TripletProt offers great potentials for the protein informatics tasks and can be widely applied to similar tasks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class, multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including a recurrent language model-based approach (i.e., UniRep), as well as a protein-protein interaction (PPI) network and sequence-based method (i.e., DeepGO). Our TripletProt showed an overall improvement of F1 score in the above mentioned comprehensive functional annotation tasks, solely relying on the PPI network. Availability: The source code and datasets are available at https://github.com/EsmaeilNourani/TripletProt. © 2004-2012 IEEE
  6. Keywords:
  7. Protein representation learning ; siamese networks ; triplet loss ; Bioinformatics ; Classification (of information) ; Deep learning ; Gene Ontology ; Biological entities ; Computational costs ; Functional annotation ; Machine learning applications ; Neural-networks ; New approaches ; Protein representation learning ; Protein-protein interaction networks ; Siamese network ; Triplet loss ; Proteins
  8. Source: IEEE/ACM Transactions on Computational Biology and Bioinformatics ; Volume 19, Issue 6 , 2022 , Pages 3744-3753 ; 15455963 (ISSN)
  9. URL: https://pubmed.ncbi.nlm.nih.gov/34460382