Loading...

Motif Finding in DNA Sequences by Using Machine Learning Approach

Haghir Ebrahimabadi, Mohammad | 2017

539 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 50434 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Fatemizadeh, Emadeddin
  7. Abstract:
  8. Motifs are patterns which can be extracted from specific subsequences of promoter region of some related genes. Transcription factor proteins bind to these subsequences and play a significant role in gene expression regulation.
    Motif discovery is a challenging problem in molecular biology and has been attracting researcher’s attention for years. Different kind of data and computational methods have been used to unravel this problem, but there is still room for improvement. In this study, our goal was to develop a method with the ability to identify all the TFBS signals, including known and unknown, inside the input set of sequences. We developed a clustering method specialized as part of our algorithm which outperforms other existing clustering methods such as DNACLUST and CD-HIT-EST in clustering short sequences. A scoring system was needed to determine how much a cluster is close to being a real motif. Multiple features are calculated based on the contents of each cluster to determine the score of the cluster. These features contain a set of divergence measures, positional, and occurrence information. These scores are combined in a way that a trade-off between them determines the clusters situation. There is an option to compare the final results with the motif databases such as Jolma2013, and UniProbe using Tomtom motif comparison tool. Algorithm Evaluation has been performed on three datasets from ABS database
  9. Keywords:
  10. Motif ; Clustering ; DNA Sequencing ; Machine Learning ; Trascription Factor Binding Sites (TFBS)

 Digital Object List

 Bookmark

...see more