Loading...
Analysis of Gene Expression Data in Bioinformatics Data Sets Using Machine Learning Approaches
Bagherian, Misagh | 2009
638
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 39186 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Beigy, Hamid
- Abstract:
- As a robust and accurate classification of tumors is necessary for successful treatment of cancer, classification of DNA microarray data has been widely used in successful diagnosis of cancers and some other biological diseases. But the main challenge in classification of microarray data is the extreme asymmetry between the dimensionality of features (usually thousands or even tens of thousands of genes) and that of tissues (few hundreds of samples). Because of such curse of dimensionality, a class prediction model could be very successful in classifying one type of dataset but may fail to perform well in some other ones. Overfitting is another problem that prevents conventional learning methods from achieving accurate and robust classification. Ensemble learning combines multiple learned models under the assumption that “two (or more) heads are better than one.” The decisions of multiple hypotheses are combined in ensemble learning to produce more accurate and less risky results. This dissertation presents a novel ensemble machine learning approach for the development of robust microarray data classification. But unlike widely used ensemble approaches in which bootstrapped training data are used, we keep the original training data unchanged. The main idea of our model is to build an ensemble of base classifiers that use several top-ranked and bottom-ranked genes to predict the class of each sample. In combining outputs of base classifiers, we do not simply average the classifier outputs with majority voting. Instead, using stacked generalization, we try to learn the base classifiers’ outputs (behavior) for different samples. The performance of our model has been evaluated over four publicly available cancer microarray datasets from the Kent Ridge bio-medical repository. We compare the performance of our model with published results of some other microarray classification schemes with a stress on traditional and modern ensemble approaches. Experimental results have demonstrated that not only the suggested classifier greatly outperforms existing conventional machine learning methods, but also it is notably more accurate and robust than some traditional ensemble-based classifiers and its performance could be compared to some recently introduced outstanding ensemble-based methods
- Keywords:
- Machine Learning ; Ensemble Learning ; Bioinformatics ; Gene Expression Data ; DNA Microarrays ; Stacked Greneralization ; Robust Classification
- محتواي پايان نامه
- view