Loading...

Genome-Wide Association Study via Machine Learning Techniques

Najafi, Amir | 2015

665 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 47053 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Fatemizadeh, Emad; Motahari, Abolfazl
  7. Abstract:
  8. Development of DNA sequencing technologies in the recent years magnifies the need for computational tools in genomic data processing, and thus has attracted inten- sive research interest to this area. Among them, Genome-Wide Association Study (GWAS) refers to discovering of causal relationships among genetic sequences of living organisms and the macroscopic phenotypes present in their physiological structure. Chosen phenotypes for genomic association studies are mostly vulnerability or im- munity to common genetic diseases. Conventional methods in GWAS consists of statistical hypothesis testing algorithms in case/control approaches; Most of which are based upon single-locus analysis and population uniformity assumption. More recent approaches employ machine learning frameworks to assess the statistical infor- mation underlying genetic Epistasis (gene-gene interactions) and detection of hidden genetic sub-populations.In this thesis, a two-stage approach is utilized to leverage the quality and time- complexity of results for a class of current solutions in GWAS. In the first stage a generative evolutionary model is proposed to investigate the propagation mech- anism of causal haplo-blocks in genetic populations. The mentioned model builds upon celebrated Wright-Fisher process and Infinite Sites model. Based on statistical observations according to simulation of evolutionary model, in the second stage a machine learning module is designated to effectively handle the simulated data.The aim of this framework is to model multi-genic diseases and clustering of genetically- varying subgroups. As a result, the proposed module, called Piecewise Linear Support Vector Machine, is capable of detecting population layers and discovering of causal factors in simulated genotype data. Piecewise Linear SVM performs faster and more accurate than rival methods such as:Conventional SVM, Genetic Algorithm and Single-Locus Hypothesis Testing on GWAS data. On other datasets, the proposed module is much faster than similar frameworks such as Dirichlet Process Mixtures in simultaneous clustering and classification of data samples. Testing the method on real-world data including genotypes of a Parkinson disease case/control group, reveals effective performance in finding previously reported SNPs; In addition, new disease-causing candidates are discovered and reported as well
  9. Keywords:
  10. Machine Learning ; Support Vector Machine (SVM) ; Genome Wide Association Study ; Probabilistic Evolutionary Models

 Digital Object List

 Bookmark

No TOC