Semi-supervised Breast Cancer Subtype Clustering Using Microarray Datasets

Vasei, Hamed | 2016

1500 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 48963 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Motahhari, Abolfazl
  7. Abstract:
  8. Gene expression microarrays can be used for precision medicine and targeted therapies. The data generated by microarrays are high-dimensional causing statistical inference of any parameter a daunting task. In this thesis, it is shown that regardless of high-dimensional datasets produced by microarrays, the inference can be robust in the sense that random selection of features results in the same conclusion as far as the number of selected features are chosen appropriately. Stratifying patients with breast cancer based on their gene expression levels shows that patient subtypes are almost independent of the feature selection strategy. Moreover, using less noisy datasets coming from RNAseq platforms does not change the subtypes substantially. This is an important result indicating the power of microarray and RNAseq platforms is the same leading to same clinical course actions.Subtypes obtained from unsupervised clustering, also have biological meaning. For example two robust and stable subtypes are found. The first class is the well known triple negative case which is ER-/PR-/HER2-. The second class, can be identified by ER-/PR-/HER2+. This is an important finding as it reveals these three features can identify the subclasses of interests. In a supervised manner, a set of features is selected. In fact, biological labels are used to choose a set of genes that are highly correlated to the labels. To our surprises, it is shown that the stratification is still robust
  9. Keywords:
  10. Breast Cancer ; Microarray ; Clustering ; Gene Expression Data ; Cancer Subtypes ; Statistical Inference ; Computational Genomics

 Digital Object List


...see more