Analysis of Gene Expression Data in Bioinformatics Data Sets Using Machine Learning Approaches

Bagherian, Misagh; Beigy, Hamid

Please enable javascript in your browser.

Analysis of Gene Expression Data in Bioinformatics Data Sets Using Machine Learning Approaches

Bagherian, Misagh | 2009

638 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 39186 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Beigy, Hamid
Abstract:
As a robust and accurate classification of tumors is necessary for successful treatment of cancer, classification of DNA microarray data has been widely used in successful diagnosis of cancers and some other biological diseases. But the main challenge in classification of microarray data is the extreme asymmetry between the dimensionality of features (usually thousands or even tens of thousands of genes) and that of tissues (few hundreds of samples). Because of such curse of dimensionality, a class prediction model could be very successful in classifying one type of dataset but may fail to perform well in some other ones. Overfitting is another problem that prevents conventional learning methods from achieving accurate and robust classification. Ensemble learning combines multiple learned models under the assumption that “two (or more) heads are better than one.” The decisions of multiple hypotheses are combined in ensemble learning to produce more accurate and less risky results. This dissertation presents a novel ensemble machine learning approach for the development of robust microarray data classification. But unlike widely used ensemble approaches in which bootstrapped training data are used, we keep the original training data unchanged. The main idea of our model is to build an ensemble of base classifiers that use several top-ranked and bottom-ranked genes to predict the class of each sample. In combining outputs of base classifiers, we do not simply average the classifier outputs with majority voting. Instead, using stacked generalization, we try to learn the base classifiers’ outputs (behavior) for different samples. The performance of our model has been evaluated over four publicly available cancer microarray datasets from the Kent Ridge bio-medical repository. We compare the performance of our model with published results of some other microarray classification schemes with a stress on traditional and modern ensemble approaches. Experimental results have demonstrated that not only the suggested classifier greatly outperforms existing conventional machine learning methods, but also it is notably more accurate and robust than some traditional ensemble-based classifiers and its performance could be compared to some recently introduced outstanding ensemble-based methods
Keywords:
Machine Learning ; Ensemble Learning ; Bioinformatics ; Gene Expression Data ; DNA Microarrays ; Stacked Greneralization ; Robust Classification

Digital Object List

محتواي پايان نامه
view

Friend's email
Your name
Your email
enter code

Sharif Digital Repository

Analysis of Gene Expression Data in Bioinformatics Data Sets Using Machine Learning Approaches

Bagherian, Misagh | 2009

Digital Object List

Bookmark