Loading...
Search for:
bioinformatics
0.195 seconds
Analysis of Gene Expression Data in Bioinformatics Data Sets Using Machine Learning Approaches
, M.Sc. Thesis Sharif University of Technology ; Beigy, Hamid (Supervisor)
Abstract
As a robust and accurate classification of tumors is necessary for successful treatment of cancer, classification of DNA microarray data has been widely used in successful diagnosis of cancers and some other biological diseases. But the main challenge in classification of microarray data is the extreme asymmetry between the dimensionality of features (usually thousands or even tens of thousands of genes) and that of tissues (few hundreds of samples). Because of such curse of dimensionality, a class prediction model could be very successful in classifying one type of dataset but may fail to perform well in some other ones. Overfitting is another problem that prevents conventional learning...
Protein Function Prediction using Protein Interaction Networks
, M.Sc. Thesis Sharif University of Technology ; Fatemizadeh, Emadoddin (Supervisor)
Abstract
Predicting protein function accurately is an important issue in the post genomic era. To achieve this goal, several approaches have been proposed deduce the function of unclassified proteins through sequence similarity, co expression profiles, and other information. Among these methods, the Global Optimization Method is an interesting and powerful tool that assigns functions to unclassified proteins based on their positions in a physical interaction network. To boost both the accuracy and speed of global optimization method, a new prediction method, Accurate Global Optimization Method (AGOM), is presented in this thesis, which employs optimal repetition method enhanced with frequency of...
Gene Selection and Reduction in DNA Microarrays to Improve Classification Accuracy of Cancerous Samples
, M.Sc. Thesis Sharif University of Technology ; Rabiee, Hamid Reza (Supervisor)
Abstract
DNA Microarray is the state-of-the-art technology in analyzing gene expression data. It has made it possible to measure expression levels of thousand of genes simultaneously. Microarray classification has been widely used in effective diagnosis of cancers and some other biological diseases. But the most challenging issue is the intense asymmetry between the dimensionality of genes and tissue samples which can wreck the classification performance. This dissertation will focus on gene selection and reduction solutions and presents a novel classification scheme which uses both gene selection and dimension reduction in its different stages. We have improved one of the recently proposed topology...
Using Transductive Learning Classification in Bioinformatics
, M.Sc. Thesis Sharif University of Technology ; Beigy, Hamid (Supervisor)
Abstract
Classification is one of the most important problems in machine learning area. Reliable and successful classification is essential for diagnosing patients for further treatment. In many applications such as bioinformatics unlabeled data is abundant and available. However labeling data is much more difficult and expensive to obtain. This dissertation presents a novel transductive approach for the development of robust microarray data classification. The transduction problem is to estimate the value of classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification method at all possible values and...
Inferring Signaling Pathways from RNAi Data Using Machine Learning
,
M.Sc. Thesis
Sharif University of Technology
;
Beigy, Hamid
(Supervisor)
Abstract
One of the standing problems in Molecular Biology and Bioinformatics is uncovering signaling pathways. Discovering the causes of many cancer-like diseases and developing better treatments for them, requires a better understanding of the behavior of cellular processes. Understanding signaling pathways can help to realize cellular processes. Due to the fast increase of possible signaling pathways when the number of components increases, the problem seems to have an inherent complexity. One of the recent methods for generating data relating to such networks is RNA interference technique. In this thesis we use data which are provided by this method. We propose two methods to infer signaling...
Biological Network Alignment using Multi-Core Processors
, M.Sc. Thesis Sharif University of Technology ; Ghodsi, Mohammad (Supervisor)
Abstract
Interactions among proteins and resulted networks of such interactions have a central role in biology. Aligning these networks leads effective information such as finding conserved complexes and evolutionary relationships. The inofrmation provided by global alignment of these networks is more meaningful in comparison to local alignment. In the problem of global alignment, time complexity is one of the most important challenges. Today, multi-core processors are used to solve many time-consuming bioinformatics problems. In this thesis, after reviewing pervious approaches on global alignment of biological networks, we present two novel algorithm for this problem. The first one is designed for...
Bayesian Filtering Approach to Improve Gene Regulatory Networks Inference Using Gene Expression Time Series
, M.Sc. Thesis Sharif University of Technology ; Fatemizadeh, Emadoddin (Supervisor) ; Arab, Shahriar (Co-Advisor)
Abstract
Gene regulatory modeling in different species is one of the main aims of Bioinformatics. Regarding the limitations of the data available and the perspectives which should be taken into account for modeling such networks, proposed methods up to now have not yet been successful in yielding a comprehensive model. In one of the recent researches, the Gene regulation process is considered as a nonlinear dynamic stochastic process and described by state space equations. Afterwards, in order for the unknown parameters to be estimated, Extended Kalman Filtering is used. In this thesis, first of all, Gene complexes are taken into consideration instead of genes and afterwards, Extended Kalman...
Human Genome Sequence Analysis Using Statistical and Machine Learning Methods
,
M.Sc. Thesis
Sharif University of Technology
;
Manzuri Shalmani, Mohammad Taghi
(Supervisor)
Abstract
During recent decades, dramatic advances in Genetics and Molecular Biology, has provided scientists with enormous amounts of molecular genomic information of different living organisms, from DNA sequences to complex 3d structures of proteins. This information is raw data which their analysis can provide better understanding of genome mechanisms, discriminating healthy and tumor cells, predicting disease type, making drugs based on genome information, and many more applications. Here, one important issue is the inevitable use of computer science and statistics to analyze these data; such that according to the vast amount of data, would provide intelligent methods, which yield most accurate...
Learning and Associating Phenotypic Behavior of Organisms using Biological data
, M.Sc. Thesis Sharif University of Technology ; Beigy, Hamid (Supervisor) ; Motahari, Abolfazl (Supervisor)
Abstract
Datasets extracted from gene expression microarrays contain information about the phenotypic behavior of organisms. Turning this information into knowledge, i.e. finding associative genes with a given phenotype, is a daunting task. This is due to the high dimensionality of the data as the number of features on a gene expression microarray is usually very large. Moreover, a phenotype may change the expression pattern of a set of genes rather than changing each gene’s expression independently. To tackle the second problem, integrating other sources of information such as Protein-Protein Interaction (PPI) networks is required. In this thesis, the PPI network extracted from the String database...
An Efficient Model For Considering the Effects Of Drug On Cancer Cells
, M.Sc. Thesis Sharif University of Technology ; Habibi, Jafar (Supervisor)
Abstract
The development of technologies and some defects in medicine caused to emerge a new approach called precision medicine. Unlike the traditional medicine, medical experts do best treatment for each patient based on his genetic characteristics in this approach. Predicting drug response on cancer cell lines is one of the most vital challenges in this area. Various approaches have been proposed to construct predicting models while the substantial distinctions between resistant and sensitive cell lines had been neglected in them. Here, we propose a new approach for constructing the predictive model. In our approach, we utilized the distinctions between sensitive and resistant cell lines and also...
Distributed Processing of Next Generation Sequencing Data Set
, M.Sc. Thesis Sharif University of Technology ; Goudarzi, Maziar (Supervisor) ; Motahari, Abolfazl (Supervisor)
Abstract
DNA analysis plays a significant role in fields such as pharmacy, agriculture, genealogy, and forensics. Next generation sequencing datasets cover a gene several times due to a large number of readings. Therefore, the initial data volume is several times the amount of memory required to store the DNA strand. First, the DNA sequence of a sample should be made using the primary data, and then the difference should be found by comparing the sample DNA sequence with the reference DNA sequence. By finding these differences, one can extract the characteristics of the tested species. The extracted properties are precious for genetics researchers. For example, they can produce drugs that are...
Modeling of Genetic Mutations Associated with Protein Pathway Common in Alzheimer, Parkinson and Macular Degeneration Diseases
, M.Sc. Thesis Sharif University of Technology ; Jahed, Mehran (Supervisor) ; Hossein Khalaj, Babak (Supervisor) ; Shahpasand, Kourosh (Co-Supervisor)
Abstract
Extensive studies have been performed on the genetic variations involved in common neurodegenerative diseases such as Alzheimer's, macular degeneration, and Parkinson's. In most cases, no specific gene has been identified pointing to a distinct pathogenic pathway, therefore, this study mainly aims to find common genes among aforementioned diseases according to determination of a specific pathogenic protein pathway.In this study, we reached a deep understanding of the function of nervous system and the discovery of causative agents of the diseases by applying the sources of information from genome datasets in bioinformatics analysis. The utilized database comprises the classification of...
DNA Classification Using Optical Processing based on Alignment-free Methods
, M.Sc. Thesis Sharif University of Technology ; Koohi, Somayyeh (Supervisor)
Abstract
In this research, an optical processing method for DNA classification is presented in order to overcome the problems in the previous methods. With improving in the operational capacity of the sequencing process, which has increased the number of genomes, comparing sequences with a complete database of genomes is a serious challenge to computational methods. Most current classification programs suffer from either slow classification speeds, large memory requirements, or both. To achieve high speed and accuracy at the same time, we suggest using optical processing methods. The performance of electronic processing-based computing, especially in the case of large data processing, is usually...
Prognostic Biomarker Selection for Breast Cancer using Bioinformatics and Deep Learning
, M.Sc. Thesis Sharif University of Technology ; Sharifi Zarchi, Ali (Supervisor)
Abstract
Triple Negative Breast Cancer (TNBC) is an invasive subtype of breast cancer. Finding prognostic biomarkers is helpful in choosing the appropriate treatment procedure for patients of this cancer. In recent years, the role of microRNAs in various biological processes, including cancer, has been identified, and their accessibility and stability have made these types of molecules an ideal biomarker. In the first phase of this study, with the aim of overcoming the limitations of previous studies, a new bioinformatics protocol has been proposed to investigate the prognostic miRNAs of triple negative breast cancer. First, using survival analysis, 56 prognostic miRNAs which had a significant...
Fast Alignment-free Protein Comparison Approach based on FPGA Implementation
, M.Sc. Thesis Sharif University of Technology ; Koohi, Somayyeh (Supervisor)
Abstract
Protein, as the functional unit of the cell, plays a vital role in its biological function. With the advent of advanced sequencing techniques in recent years and the consequent exponential growth of the number of protein sequences extracted from diverse biological samples, their analysis, comparison, and classification have faced a considerable challenge. Existing methods for comparing proteins divide into two categories: methods based on alignment and alignment-free. Although alignment-based methods are among the most accurate methods, they face inherent limitations such as poor analysis of protein groups with low sequence similarity, time complexity, computational complexity, and memory...
Exploring Pivot Genes and Clinical Prognosis Using Combined Bioinformatics Approaches in the Colon Cancer
,
M.Sc. Thesis
Sharif University of Technology
;
Foroughmand Araabi, Mohammad Hadi
(Supervisor)
Abstract
Colorectal cancer (CRC) is one of the most common cause of cancer death worldwide. Identification of pivot genes in colorectal cancer can play an important role as biomarkers in predicting and early diagnosis and reducing the number of deaths caused by this disease. In this study, the aim of which is to discover pivot genes in colorectal cancer, six microarray datasets selected from the GEO database including 277 tumor tissue samples and 325 normal colon tissue samples. After data processing, differentially expressed genes and CRC-related genes were screened and 285 shared genes between them were identified for subsequent analysis. Based on 285 shared genes, the protein-protein interaction...
Exploration of Existing Patterns in Copy Number Variations of Genetic Diseases and Disorders
, Ph.D. Dissertation Sharif University of Technology ; Rabiee, Hamid Reza (Supervisor)
Abstract
One of the main sources of genetic variations are structural variations, including the widespread Copy Number Variations (CNVs). CNVs include two types, copy of genetic material (duplication) and loss of part of genetic sequence (deletion) and typically range from one kilobase pairs (Kbp) to several megabase pairs (Mbp) in size. Most of the copy number variations are occured in in healthy people; however, these variants can also contribute to numerous diseases through several genetic mechanisms (e.g. change gene dosage through insertions, duplications or deletions). The CNV study can provide greater insight into the etiology of disease phenotypes. Nowadays, with the huge amount of investment...
Fundamental Bounds for Clustering of Bernoulli Mixture Models
, M.Sc. Thesis Sharif University of Technology ; Motahari, Abolfazl (Supervisor)
Abstract
A random vector with binary components that are independent of each other is referred to as a Bernoulli random vector. A Bernoulli Mixture Model (BMM) is a combination of a finite number of Bernoulli models, where each sample is generated randomly according to one of these models. The important challenge is to estimate the parameters of a Bernoulli Mixture Model or to cluster samples based on their source models. This problem has applications in bioinformatics, image recognition, text classification, social networks, and more. For example, in bioinformatics, it pertains to clustering ethnic groups based on genetic data. Many studies have introduced algorithms for solving this problem without...