    Breaking Lander-Waterman's coverage bound

    , Article PLOS ONE ; Volume 11, Issue 11 , 2016 ; 19326203 (ISSN) Nashta Ali, D ; Motahari, S. A ; Hosseinkhalaj, B ; Sharif University of Technology
    Public Library of Science 
    Lander-Waterman's coverage bound establishes the total number of reads required to cover the whole genome of size G bases. In fact, their bound is a direct consequence of the well-known solution to the coupon collector's problem which proves that for such genome, the total number of bases to be sequenced should be O(G ln G). Although the result leads to a tight bound, it is based on a tacit assumption that the set of reads are first collected through a sequencing process and then are processed through a computation process, i.e., there are two different machines: one for sequencing and one for processing. In this paper, we present a significant improvement compared to Lander-Waterman's... 

    Privacy in DNA Sequencing

    , M.Sc. Thesis Sharif University of Technology Gholami, Ali (Author) ; Maddah-ali, Mohammad Ali (Supervisor) ; Motahari, Abolfazl (Co-Supervisor)
    DNA sequence is the lifetime private information of each individual: it can reveal personal traits, health status, and medical risk of that individual, it can be abused by entities such as insurance companies, it can be used for identity theft, etc. Unfortunately, due to cost, regulations, or some restrictions, we may not be able to complete DNA sequencing in-house and have to outsource it to some unreliable companies in some foreign countries.This would compromise the DNA privacy from the beginning. This would raise the question that how we can guarantee the DNA privacy in the process of sequencing.Here we propose a solution for private DNA sequencing by exploiting the fact that the process... 

    Draft genome of Dugesia japonica provides insights into conserved regulatory elements of the brain restriction gene nou-darake in planarians

    , Article Zoological Letters ; Volume 4, Issue 1 , 2018 ; 2056306X (ISSN) An, Y ; Kawaguchi, A ; Zhao, C ; Toyoda, A ; Sharifi Zarchi, A ; Mousavi, S. A ; Bagherzadeh, R ; Inoue, T ; Ogino, H ; Fujiyama, A ; Chitsaz, H ; Baharvand, H ; Agata, K ; Sharif University of Technology
    BioMed Central Ltd  2018
    Background: Planarians are non-parasitic Platyhelminthes (flatworms) famous for their regeneration ability and for having a well-organized brain. Dugesia japonica is a typical planarian species that is widely distributed in the East Asia. Extensive cellular and molecular experimental methods have been developed to identify the functions of thousands of genes in this species, making this planarian a good experimental model for regeneration biology and neurobiology. However, no genome-level information is available for D. japonica, and few gene regulatory networks have been identified thus far. Results: To obtain whole-genome information on this species and to study its gene regulatory... 

    A new multiple dna and protein sequences alignment method based on evolutionary algorithms

    , Article Journal of Knowledge and Health in Basic Medical Sciences ; Volume 16, Issue 1 , 2021 , Pages 13-20 ; 1735577X (ISSN) Etminan, N ; Parvinnia, E ; Sharifi Zarchi, A ; Sharif University of Technology
    Shahroud University of Medical Sciences  2021
    Introduction: The study of life and the detection of gene functions is an important issue in biological science. Multiple sequences alignment methods measure the similarity of DNA sequences. Nonetheless, when the size of genome sequences is increased, we encounter with the lack of memory and increasing the run time. Therefore, a fast method with a suitable accuracy for genome alignment has a significant impact on the analysis of long sequences. Methods: We introduce a new method in which, it first divides each sequence into short sequences. Then, it uses evolutionary algorithms to align the sequences. Results: The proposed method has been evaluated in seven datasets with different number of... 

    Neural network-based approaches, solving haplotype reconstruction in MEC and MEC/GI models

    , Article Neural Computing and Applications ; Volume 22, Issue 7-8 , 2013 , Pages 1397-1405 ; 09410643 (ISSN) Moeinzadeh, M. H ; Asgarian, E ; Sharifian-R. S ; Sharif University of Technology
    Single nucleotide polymorphism (SNP) in human genomes is considered to be highly associated with complex genetic diseases. As a consequence, obtaining all SNPs from human populations is one of the primary goals of recent studies on human genomics. The two sequences of SNPs in diploid human organisms are called haplotypes. In this paper, the problem of haplotype reconstruction from SNP fragments with and without genotype information is studied. Minimum error correction (MEC) is an important model for this problem but only effective when the error rate of the fragments is low. MEC/GI, as an extension to MEC model, employs the related genotype information besides the SNP fragments and,... 

    CANCERSIGN: a user-friendly and robust tool for identification and classification of mutational signatures and patterns in cancer genomes

    , Article Scientific Reports ; Volume 10, Issue 1 , 2020 Bayati, M ; Rabiee, H. R ; Mehrbod, M ; Vafaee, F ; Ebrahimi, D ; Forrest, A. R. R ; Alinejad Rokny, H ; Sharif University of Technology
    Nature Research  2020
    Analysis of cancer mutational signatures have been instrumental in identification of responsible endogenous and exogenous molecular processes in cancer. The quantitative approach used to deconvolute mutational signatures is becoming an integral part of cancer research. Therefore, development of a stand-alone tool with a user-friendly interface for analysis of cancer mutational signatures is necessary. In this manuscript we introduce CANCERSIGN, which enables users to identify 3-mer and 5-mer mutational signatures within whole genome, whole exome or pooled samples. Additionally, this tool enables users to perform clustering on tumor samples based on the proportion of mutational signatures in... 

    Meta-aligner: long-read alignment based on genome statistics

    , Article BMC Bioinformatics ; Volume 18, Issue 1 , 2017 ; 14712105 (ISSN) Nashta Ali, D ; Aliyari, A ; Ahmadian Moghadam, A ; Edrisi, M. A ; Motahari, S. A ; Khalaj, B. H ; Sharif University of Technology
    Background: Current development of sequencing technologies is towards generating longer and noisier reads. Evidently, accurate alignment of these reads play an important role in any downstream analysis. Similarly, reducing the overall cost of sequencing is related to the time consumption of the aligner. The tradeoff between accuracy and speed is the main challenge in designing long read aligners. Results: We propose Meta-aligner which aligns long and very long reads to the reference genome very efficiently and accurately. Meta-aligner incorporates available short/long aligners as subcomponents and uses statistics from the reference genome to increase the performance. Meta-aligner estimates... 

    Statistical association mapping of population-structured genetic data

    , Article IEEE/ACM Transactions on Computational Biology and Bioinformatics ; 2017 ; 15455963 (ISSN) Najafi, A ; Janghorbani, S ; Motahari, S. A ; Fatemizadeh, E ; Sharif University of Technology
    Association mapping of genetic diseases has attracted extensive research interest during the recent years. However, most of the methodologies introduced so far suffer from spurious inference of the associated sites due to population inhomogeneities. In this paper, we introduce a statistical framework to compensate for this shortcoming by equipping the current methodologies with a state-of-the-art clustering algorithm being widely used in population genetics applications. The proposed framework jointly infers the disease-associated factors and the hidden population structures. In this regard, a Markov Chain-Monte Carlo (MCMC) procedure has been employed to assess the posterior probability... 

    Comparative Analysis of Haplotype Assembly Algorithms to Identify and Propose Optimal Methods

    , M.Sc. Thesis Sharif University of Technology Bagher, Melina (Author) ; Jahed, Mehran (Supervisor) ; Hossein Khalaj, Babak (Supervisor)
    Humans, as a diploid species, have two nucleotide sequences of homologous chromosomes in their genomes, where one set is inherited from the mother, and the other comes from the father. The Single Individual Haplotype assembly problem (SIH) refers to the reconstruction of these two distinct nucleotide sequences of a chromosome from the sequencing reads, and it is currently considered one of the most important issues in the field of computational genomics, which plays an essential role in solving various genetic and medical problems.Nowadays direct experimental methods are not welcomed due to their high cost, and labor intensity, and are limited to certain regions of the genome, therefore,... 

    Genome annotation and comparative genomic analysis of Bacillus subtilis MJ01, a new bio-degradation strain isolated from oil-contaminated soil

    , Article Functional and Integrative Genomics ; Volume 18, Issue 5 , 2018 , Pages 533-543 ; 1438793X (ISSN) Rahimi, T ; Niazi, A ; Deihimi, T ; Taghavi, S. M ; Ayatollahi, S ; Ebrahimie, E ; Sharif University of Technology
    Springer Verlag  2018
    One of the main challenges in elimination of oil contamination from polluted environments is improvement of biodegradation by highly efficient microorganisms. Bacillus subtilis MJ01 has been evaluated as a new resource for producing biosurfactant compounds. This bacterium, which produces surfactin, is able to enhance bio-accessibility to oil hydrocarbons in contaminated soils. The genome of B. subtilis MJ01 was sequenced and assembled by PacBio RS sequencing technology. One big contig with a length of 4,108,293 bp without any gap was assembled. Genome annotation and prediction of gene showed that MJ01 genome is very similar to B. subtilis spizizenii TU-B-10 (95% similarity). The comparison... 

    Algorithms of Genome-Wide Association Studies

    , M.Sc. Thesis Sharif University of Technology Valishirin, Hossein (Author) ; Foroughmand Aarabi, Mohammad Hadi (Supervisor)
    The field of Genome-Wide Asocciation Studies (GWAS) plays a vital role in understanding the genetic basis of complex traits and diseases. In this thesis, the focus is on investigating the effectiveness of two approaches combining Differential Evolution (DE) with Random Forest (RF) and support vector machine (SVM) for feature selection in the context of GWAS. Arabidopsois Thaliana dataset is used as experimental dataset for comparative analysis. The main goal is to achieve more efficient feature selection while maintaining competitive accuracy compared to RF and SVM without using DE. This research includes conducting experiments using DE with RF and DE with SVM followed by a comprehensive... 

    A novel pattern matching algorithm for genomic patterns related to protein motifs

    , Article Journal of Bioinformatics and Computational Biology ; Volume 18, Issue 1 , 2020 Foroughmand Araabi, M. H ; Goliaei, S ; Goliaei, B ; Sharif University of Technology
    World Scientific Publishing Co. Pte Ltd  2020
    Patterns on proteins and genomic sequences are vastly analyzed, extracted and collected in databases. Although protein patterns originate from genomic coding regions, very few works have directly or indirectly dealt with coding region patterns induced from protein patterns. Results: In this paper, we have defined a new genomic pattern structure suitable for representing induced patterns from proteins. The provided pattern structure, which is called "Consecutive Positions Scoring Matrix (CPSSM)", is a replacement for protein patterns and profiles in the genomic context. CPSSMs can be identified, discovered, and searched in genomes. Then, we have presented a novel pattern matching algorithm... 

    Haploblock Detection Based on Reads and Population Structure

    , M.Sc. Thesis Sharif University of Technology Akbari, Elahe (Author) ; Motahari, Abolfazl (Supervisor)
    Human is diploid specie that inherits a set of chromosomes from their mother and a set from their father. The process of separating the nucleotide content of a set of extracted maternal and paternal chromosomes for an individual or a population is called phasing the genome of the individual or the population. The placement of any two variants relative to each other in diploid species is possible in two forms: cis (placement of both variants on one chromosome), and trans (placement of variants on different chromosomes). Each of these conditions leads to different phenotypes. Thus, understanding how variants are placed relative to each other is a crucial problem in human biology which is... 

    Inferring Relation between World and Iranian Populations from Microarray Data

    , M.Sc. Thesis Sharif University of Technology Saberi, Sasan (Author) ; Hossein Khalaj, Babak (Supervisor) ; Motahhari, Abolfazl (Supervisor)
    One of the branches of genetic studies is population genetics. Each population has its own characteristics due to its evolutionary history, cultural characteristics and geography, which distinguish it from other populations. Scientific and technological advances in recent decades have led to the production of new generation sequencing machines and the creation of large genetic data. These data contain important genetic information and answers to many questions about the origin of humans, the history of populations and their evolutionary process. More and better understanding of the human genome and the distance between populations can help to better understand biological mechanisms and deal... 

    IMOS: improved meta-aligner and minimap2 on spark

    , Article BMC Bioinformatics ; Volume 20, Issue 1 , 2019 ; 14712105 (ISSN) Hadadian Nejad Yousefi, M ; Goudarzi, M ; Motahari, A ; Sharif University of Technology
    BioMed Central Ltd  2019
    Background: Long reads provide valuable information regarding the sequence composition of genomes. Long reads are usually very noisy which renders their alignments on the reference genome a daunting task. It may take days to process datasets enough to sequence a human genome on a single node. Hence, it is of primary importance to have an aligner which can operate on distributed clusters of computers with high performance in accuracy and speed. Results: In this paper, we presented IMOS, an aligner for mapping noisy long reads to the reference genome. It can be used on a single node as well as on distributed nodes. In its single-node mode, IMOS is an Improved version of Meta-aligner (IM)... 

    Genome-Wide Association Study via Machine Learning Techniques

    , M.Sc. Thesis Sharif University of Technology Najafi, Amir (Author) ; Fatemizadeh, Emad (Supervisor) ; Motahari, Abolfazl (Co-Advisor)
    Development of DNA sequencing technologies in the recent years magnifies the need for computational tools in genomic data processing, and thus has attracted inten- sive research interest to this area. Among them, Genome-Wide Association Study (GWAS) refers to discovering of causal relationships among genetic sequences of living organisms and the macroscopic phenotypes present in their physiological structure. Chosen phenotypes for genomic association studies are mostly vulnerability or im- munity to common genetic diseases. Conventional methods in GWAS consists of statistical hypothesis testing algorithms in case/control approaches; Most of which are based upon single-locus analysis and... 

    Engineering of Behavioral Model of Startups in Utilizing the Necessities and Challenges

    , M.Sc. Thesis Sharif University of Technology Shahedipour, Saeed (Author) ; Fatahi Valilai, Omid (Supervisor)
    Startups mostly originated from creating an idea and introduced themselves as a new plan for optimization and development in societies. Being exquisite foreclosed any prediction of their future for their owners. On the other hand, the destination of an idea to be a product needs a solution. If the solution matches the structural behavior of the society, it will survive. Simulation of the structural behavior of society through genetic science creates a suitable gadget for the investigation of genetic logic. It accumulated the response of stakeholders of society in that subject according to the standard index in the format of information and database completely and meticulously. It depicts the... 

    Detecting Large-scale Evolutionary Events and Multiple Alignment of Whole Genomes

    , M.Sc. Thesis Sharif University of Technology Afshinfard, Amir Hossein (Author) ; Motahari, Abolfazl (Supervisor) ; Rabiee, Hamid Reza (Co-Advisor)
    Recent advances in genome sequencing technologies have provided a wide variety of completely sequenced genomes. This opened up the opportunity to study genomic sequences, using pairwise or multiple alignment of the whole-genomes. The aim is to explore the similarities and differences between genomes for further comparative studies. This task is challenging because genomes of different species have undergone not only small mutations but also many large-scale evolutionary events such as insertion, deletion and inversion.There has been a lot of research on developing whole-genome alignment algorithms. Having an optimal trade-off between sensitivity, accuracy and computational expense are very... 

    DNA Classification Using Optical Processing based on Alignment-free Methods

    , M.Sc. Thesis Sharif University of Technology Kalhor, Reza (Author) ; Koohi, Somayyeh (Supervisor)
    In this research, an optical processing method for DNA classification is presented in order to overcome the problems in the previous methods. With improving in the operational capacity of the sequencing process, which has increased the number of genomes, comparing sequences with a complete database of genomes is a serious challenge to computational methods. Most current classification programs suffer from either slow classification speeds, large memory requirements, or both. To achieve high speed and accuracy at the same time, we suggest using optical processing methods. The performance of electronic processing-based computing, especially in the case of large data processing, is usually...