Search for: computational-biology
Article Heliyon ; Volume 4, Issue 7 , 2018 ; 24058440 (ISSN) ; Shirali Hossein Zade, R ; Takalloo, Z ; Mahdevar, G ; Emamjomeh, A ; Sajedi, R. H ; Zahiri, J ; Sharif University of Technology
Elsevier Ltd 2018
Various cold-adapted organisms produce antifreeze proteins (AFPs), which prevent the freezing of cell fluids by inhibiting the growth of ice crystals. AFPs are currently being recognized in various organisms, living in extremely low temperatures. AFPs have several important applications in increasing freeze tolerance of plants, maintaining the tissue in frozen conditions and producing cold-hardy plants by applying transgenic technology. Substantial differences in the sequence and structure of the AFPs, pose a challenge for researchers to identify these proteins. In this paper, we proposed a novel method to identify AFPs, using supportive vector machine (SVM) by incorporating 4 types of...
Article Bioinformatics ; Volume 36, Issue 12 , 15 June , 2020 , Pages 3662-3668 ; Parvinnia, E ; Sharifi Zarchi, A ; Sharif University of Technology
Oxford University Press 2020
Motivation: Multiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then...
M.Sc. Thesis Sharif University of Technology ; Motahhari, Abolfazl
Today, computer simulations are employed prevalently by researchers to understand biological processes. The tendency to use computational methods has been increased recently due to the high costs and errors of experimental methods. Integrins are membrane proteins that mechanically attach cells to the extra cellular matrix (ECM) and derive some behaviors such as cell migration. Moreover, integrins have biochemical functions. They transduce environment signals and trigger chemical pathways. As a result, integrins regulate cell shape, motility, etc. The investigation of integrin behavior in a real environment is very difficult due to the presence of many other proteins that interfere with the...
Article Neural Computing and Applications ; Volume 23, Issue 3-4 , 2013 , Pages 1205- ; 09410643 (ISSN) ; Saeidi Ramiyani, S ; Sharif University of Technology
PFP-WGAN: Protein function prediction by discovering gene ontology term correlations with generative adversarial networks, Article PLoS ONE ; Volume 16, Issue 2 , 2021 ; 19326203 (ISSN) ; Soleymani, M ; Rabiee, H. R ; Kaazempur Mofrad, M. R ; Sharif University of Technology
Public Library of Science 2021
Understanding the functionality of proteins has emerged as a critical problem in recent years due to significant roles of these macro-molecules in biological mechanisms. However, in-laboratory techniques for protein function prediction are not as efficient as methods developed and processed for protein sequencing. While more than 70 million protein sequences are available today, only the functionality of around one percent of them are known. These facts have encouraged researchers to develop computational methods to infer protein functionalities from their sequences. Gene Ontology is the most well-known database for protein functions which has a hierarchical structure, where deeper terms are...
M.Sc. Thesis Sharif University of Technology ; Ghodsi, Mohammad
In recent years, game theory passed from areas of economics and socials to other fields such as computational biology and bioinformatics. Several evolutionary games and cooperative games have been defined to predict the behavior of nonrational agents in interaction situations arising from computational biology.One of the these main applications is using Shapley value. Shapley value allocates a fair value to each player, in games that each coalition has a profit.But in many cases the computation of Shapley value is #P-complete. Thus,the goal is to optimally find Shapley value or to approximate it in each game.Another option is Banzhaf index.One of the essential games on genes is Pathway game....
Article 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, 23 June 2019 through 26 June 2019 ; 2019 , Pages 709-720 ; 07378017 (ISSN); 9781450367059 (ISBN) ; Ghodsi, M ; Hajiaghayi, M ; Seddighin, S ; Sharif University of Technology
Association for Computing Machinery 2019
Edit distance is one of the most fundamental problems in computer science. Tree edit distance is a natural generalization of edit distance to ordered rooted trees. Such a generalization extends the applications of edit distance to areas such as computational biology, structured data analysis (e.g., XML), image analysis, and compiler optimization. Perhaps the most notable application of tree edit distance is in the analysis of RNA molecules in computational biology where the secondary structure of RNA is typically represented as a rooted tree. The best-known solution for tree edit distance runs in cubic time. Recently, Bringmann et al. show that an O(n2.99) algorithm for weighted tree edit...
Article Nature Communications ; Volume 10, Issue 1 , 2019 ; 20411723 (ISSN) ; Abbasi, H. S ; Singh, S ; Carpenter, A. E ; Sharif University of Technology
Nature Publishing Group 2019
Single-cell resolution technologies warrant computational methods that capture cell heterogeneity while allowing efficient comparisons of populations. Here, we summarize cell populations by adding features’ dispersion and covariances to population averages, in the context of image-based profiling. We find that data fusion is critical for these metrics to improve results over the prior alternatives, providing at least ~20% better performance in predicting a compound’s mechanism of action (MoA) and a gene’s pathway. © 2019, The Author(s)
Article Journal of Translational Medicine ; Volume 17, Issue 1 , 2019 ; 14795876 (ISSN) ; Khorsand, B ; Yousefi, A. A ; Kargar, M. J ; Shirali Hossein Zade, R ; Mahdevar, G ; Sharif University of Technology
BioMed Central Ltd 2019
Background: Angiogenesis inhibition research is a cutting edge area in angiogenesis-dependent disease therapy, especially in cancer therapy. Recently, studies on anti-angiogenic peptides have provided promising results in the field of cancer treatment. Methods: A non-redundant dataset of 135 anti-angiogenic peptides (positive instances) and 135 non anti-angiogenic peptides (negative instances) was used in this study. Also, 20% of each class were selected to construct an independent test dataset (see Additional files 1, 2). We proposed an effective machine learning based R package (AntAngioCOOL) to predict anti-angiogenic peptides. We have examined more than 200 different classifiers to build...
Article IEEE/ACM Transactions on Computational Biology and Bioinformatics ; Volume 16, Issue 2 , 2019 , Pages 636-649 ; 15455963 (ISSN) ; Janghorbani, S ; Motahari, A ; Fatemizadeh, E ; Sharif University of Technology
Institute of Electrical and Electronics Engineers Inc 2019
Association mapping of genetic diseases has attracted extensive research interest during the recent years. However, most of the methodologies introduced so far suffer from spurious inference of the associated sites due to population inhomogeneities. In this paper, we introduce a statistical framework to compensate for this shortcoming by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications. The proposed framework jointly infers the disease-associated factors and the hidden population structures. In this regard, a Markov Chain-Monte Carlo (MCMC) procedure has been employed to assess the posterior probability...
Article BMC Bioinformatics ; Volume 20, Issue 1 , 2019 ; 14712105 (ISSN) ; Goudarzi, M ; Motahari, A ; Sharif University of Technology
BioMed Central Ltd 2019
Background: Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed. Results: In this paper, we presented IMOS, an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM)...
Article Journal of Biophotonics ; Volume 13, Issue 1 , August , 2020 ; Koohi, S ; Kavehvash, Z ; Mashaghi, A ; Sharif University of Technology
Wiley-VCH Verlag 2020
Nowadays, the accelerated expansion of genetic data challenges speed of current DNA sequence alignment algorithms due to their electrical implementations. Essential needs of an efficient and accurate method for DNA variant discovery demand new approaches for parallel processing in real time. Fortunately, photonics, as an emerging technology in data computing, proposes optical correlation as a fast similarity measurement algorithm; while complexity of existing local alignment algorithms severely limits their applicability. Hence, in this paper, employing optical correlation for global alignment, we present an optical processing approach for local DNA sequence alignment to benefit both...
The performances of the chi-square test and complexity measures for signal recognition in biological sequences, Article Journal of Theoretical Biology ; Volume 251, Issue 2 , 2008 , Pages 380-387 ; 00225193 (ISSN) ; Kargar, M ; Sheari, A ; Poormohammadi, H ; Sadeghi, M ; Pezeshk, H ; Eslahchi, C ; Sharif University of Technology
With large amounts of experimental data, modern molecular biology needs appropriate methods to deal with biological sequences. In this work, we apply a statistical method (Pearson's chi-square test) to recognize the signals appear in the whole genome of the Escherichia coli. To show the effectiveness of the method, we compare the Pearson's chi-square test with linguistic complexity on the complete genome of E. coli. The results suggest that Pearson's chi-square test is an efficient method for distinguishing genes (coding regions) form pseudogenes (noncoding regions). On the other hand, the performance of the linguistic complexity is much lower than the chi-square test method. We also use the...
Article BMC Bioinformatics ; Volume 14, Issue SUPPL13 , 2013 ; 14712105 (ISSN) ; Rabiee, H. R ; Anooshahpour, M ; Sharif University of Technology
Background: The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification.Results: Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection...
Article Nature Methods ; Volume 13, Issue 4 , 2016 , Pages 310-322 ; 15487091 (ISSN) ; Heiser, L.M ; Cokelaer, T ; Linger, M ; Nesser, N. K ; Carlin, D. E ; Zhang, Y ; Sokolov, A ; Paull, E. O ; Wong, C. K ; Graim, K ; Bivol, A ; Wang, H ; Zhu, F ; Afsari, B ; Danilova, L. V ; Favorov, A. V ; Lee, W. S ; Taylor, D ; Hu, C. W ; Long, B. L ; Noren, D. P ; Bisberg, A. J ; Mills, G. B ; Gray, J. W ; Kellen, M ; Norman, T ; Friend, S ; Qutub, A. A ; Fertig, E. J ; Guan, Y ; Song, M ; Stuart, J. M ; Spellman, P. T ; Koeppl, H ; Stolovitzky, G ; Saez Rodriguez, J ; Mukherjee, S ; Afsari, B ; Al-Ouran, R ; Anton, B ; Arodz, T ; Askari Sichani, O ; Bagheri, N ; Berlow, N ; Bisberg, A. J ; Bivol, A ; Bohler, A ; Bonet, J ; Bonneau, R ; Budak, G ; Bunescu, R ; Caglar, M ; Cai, B ; Cai, C ; Carlin, D. E ; Carlon, A ; Chen, L ; Ciaccio, M. F ; Cokelaer, T ; Cooper, G ; Coort, S ; Creighton, C. J ; Daneshmand, S. M. H ; De La Fuente, A ; Di Camillo, B ; Danilova, L. V ; Dutta-Moscato, J ; Emmett, K ; Evelo, C ; Fassia, M. K. H ; Favorov, A. V ; Fertig, E. J ; Finkle, J. D ; Finotello, F ; Friend, S ; Gao, X ; Gao, J ; Garcia Garcia, J ; Ghosh, S ; Giaretta, A ; Graim, K ; Gray, J. W ; Großeholz, R ; Guan, Y ; Guinney, J ; Hafemeister, C ; Hahn, O ; Haider, S ; Hase, T ; Heiser, L. M ; Hill, S. M ; Hodgson, J ; Hoff, B ; Hsu, C. H ; Hu, C. W ; Hu, Y ; Huang, X ; Jalili, M ; Jiang, X ; Kacprowski, T ; Kaderali, L ; Kang, M ; Kannan, V ; Kellen, M ; Kikuchi, K ; Kim, D. C ; Kitano, H ; Knapp, B ; Komatsoulis, G ; Koeppl, H ; Krämer, A ; Kursa, M. B ; Kutmon, M ; Lee, W. S ; Li, Y ; Liang, X ; Liu, Z ; Liu, Y ; Long, B. L ; Lu, S ; Lu, X ; Manfrini, M ; Matos, M. R. A ; Meerzaman, D ; Mills, G. B ; Min, W ; Mukherjee, S ; Müller, C. L ; Neapolitan, R. E ; Nesser, N. K ; Noren, D. P ; Norman, T ; Oliva, B ; Opiyo, S. O ; Pal, R ; Palinkas, A ; Paull, E. O ; Planas Iglesias, J ; Poglayen, D ; Qutub, A. A ; Saez Rodriguez, J ; Sambo, F ; Sanavia, T ; Sharifi-Zarchi, A ; Slawek, J ; Sokolov, A ; Song, M ; Spellman, P. T ; Streck, A ; Stolovitzky, G ; Strunz, S ; Stuart, J. M ; Taylor, D ; Tegnér, J ; Thobe, K ; Toffolo, G. M ; Trifoglio, E ; Unger, M ; Wan, Q ; Wang, H ; Welch, L ; Wong, C. K ; Wu, J. J ; Xue, A. Y ; Yamanaka, R ; Yan, C ; Zairis, S ; Zengerling, M ; Zenil, H ; Zhang, S ; Zhang, Y ; Zhu, F ; Zi, Z ; Sharif University of Technology
Nature Publishing Group 2016
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was...
A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins, Article BMC Bioinformatics ; Volume 9 , 2008 ; 14712105 (ISSN) ; Kargar, M ; Katanforoush, A ; Arab, S ; Sadeghi, M ; Pezeshk, H ; Eslahchi, C ; Marashi, S. A ; Sharif University of Technology
Background: It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete. Results: In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments?...
Genome annotation and comparative genomic analysis of Bacillus subtilis MJ01, a new bio-degradation strain isolated from oil-contaminated soil, Article Functional and Integrative Genomics ; Volume 18, Issue 5 , 2018 , Pages 533-543 ; 1438793X (ISSN) ; Niazi, A ; Deihimi, T ; Taghavi, S. M ; Ayatollahi, S ; Ebrahimie, E ; Sharif University of Technology
Springer Verlag 2018
One of the main challenges in elimination of oil contamination from polluted environments is improvement of biodegradation by highly efficient microorganisms. Bacillus subtilis MJ01 has been evaluated as a new resource for producing biosurfactant compounds. This bacterium, which produces surfactin, is able to enhance bio-accessibility to oil hydrocarbons in contaminated soils. The genome of B. subtilis MJ01 was sequenced and assembled by PacBio RS sequencing technology. One big contig with a length of 4,108,293 bp without any gap was assembled. Genome annotation and prediction of gene showed that MJ01 genome is very similar to B. subtilis spizizenii TU-B-10 (95% similarity). The comparison...