Loading...
Search for: gene-expression-data
0.005 seconds
Total 27 records

    Semi-supervised Breast Cancer Subtype Clustering Using Microarray Datasets

    , M.Sc. Thesis Sharif University of Technology Vasei, Hamed (Author) ; Motahhari, Abolfazl (Supervisor)
    Abstract
    Gene expression microarrays can be used for precision medicine and targeted therapies. The data generated by microarrays are high-dimensional causing statistical inference of any parameter a daunting task. In this thesis, it is shown that regardless of high-dimensional datasets produced by microarrays, the inference can be robust in the sense that random selection of features results in the same conclusion as far as the number of selected features are chosen appropriately. Stratifying patients with breast cancer based on their gene expression levels shows that patient subtypes are almost independent of the feature selection strategy. Moreover, using less noisy datasets coming from RNAseq... 

    Computational Deconvolution of Bulk Tissue Transcriptomic Data

    , M.Sc. Thesis Sharif University of Technology Hashemi, Tahoura Sadat (Author) ; Motahari, Abolfazl (Supervisor)
    Abstract
    Bulk tissue RNA-seq data has been widely used for investigating the transcriptome and analyzing it for different purposes. A single bulk sample of a heterogeneous population includes different cell-types each in different proportions. Bulk tissue RNA-seq measures the average expression level of genes across these cell types and does not account for cross-subject variation in cell-type compositions. Furthermore, biological signals might be masked by taking the average of gene expressions. Because of these reasons, bulk-RNA-seq is not suffcient for studying complex tissues. Knowing these cell-type compositions are important, because studying cell-specific changes in the transcriptome might be... 

    Identification of Driver Genes in Glioblastoma Based on Single-Cell Gene Expression Data Utilizing the Concept of Pseudotime and Phylogenetic Analysis

    , M.Sc. Thesis Sharif University of Technology Mirza Abolhassani, Fatemeh (Author) ; Foroughmand Aarabi, Mohammad Hadi (Supervisor) ; Kavousi, Kaveh (Co-Supervisor) ; Zare Mirakabad, Fatemeh (Co-Supervisor)
    Abstract
    Genetic heterogeneity within a tumor, which occurs during cancer evolution, is one of the reasons for treatment failure and increased chances of drug resistance. Cancer cells initially derive from a mutated progenitor cell, resulting in shared mutated genes. Throughout the course of tumor formation and progression, the occurrence of new mutations is possible, leading to the generation of cancer cells with various mutated genes. An appropriate approach is to identify the sequence of mutations that have occurred in the tumor, which can be inferred from single-cell sequencing data. Singlecell data provides valuable information about branching events in the evolution of a cancerous tumor. In... 

    Detection and Estimation of Key Parameters in Traffic Models Using Data Mining Tools

    , M.Sc. Thesis Sharif University of Technology Moadab, Amir Hossein (Author) ; Khedmati, Majid (Supervisor)
    Abstract
    Nowadays, investigating the factors affecting traffic models from different aspects such as metropolitan planning according to the present conditions can help high-level decision-makers and also, at the micro-level, help the travelers to make appropriate decisions for scheduling affairs, route selection, and vehicle type selection. Given the importance of this topic, a framework will be presented in this study that will evaluate the impact of some identified factors such as travel distance, climate, and urban events, and then all these factors will be presented in mathematical formulas. In the end, based on the model, the travel time will be predicted. In this framework, gene expression... 

    Single-Cell RNA-seq Dropout Imputation and Noise Reduction by Machine Learning

    , M.Sc. Thesis Sharif University of Technology Moinfar, Amir Ali (Author) ; Soleymani Baghshah, Mahdih (Supervisor) ; Sharifi Zarchi, Ali (Supervisor) ; Goodarzi, Hani (Co-Supervisor)
    Abstract
    Single-cell RNA sequencing (scRNA-seq) technologies have empowered us to study gene expressions at the single-cell resolution. These technologies are developed based on barcoding of single cells and sequencing of transcriptome using next-generation sequencing technologies. Achieving this single-cell resolution is specially important when the target population is complex or heterogeneous, which is the case for most biological samples, including tissue samples and tumor biopsies.Single-cell technologies suffer from high amounts of noise and missing values, generally known as dropouts. This complexity can affect a number of key downstream analyses such as differential expression analysis,... 

    Motif Finding Application Using Edit Distance Approuch

    , M.Sc. Thesis Sharif University of Technology Mohammadi, Farzin (Author) ; Koohi, Somayyeh (Supervisor)
    Abstract
    Motif finding problem in biology is a search for repeated patterns to reveal information about gene expression, one of the most complex subsystems in genomics. ChIP-seq technology abled researchers to investigate location of protein-DNA interactions but analyzing downstream results of such experiments to find actual regulatory signals in genome is challenging. For many years, applications of motif finding had models based on limiting assumption as an exchange for lower computational complexity. Results: AKAGI program is build upon upgraded methods and new general models to investigate statistical and experimental evidences for accurately finding significant patterns among biological... 

    Analysis and Design of Single-cell RNA Sequencing Data Normalization Algorithms

    , M.Sc. Thesis Sharif University of Technology Mohseni, Sepideh (Author) ; Hossein Khalaj, Babak (Supervisor)
    Abstract
    Single Cell RNA sequencing (scRNA-seq) data provides more information about gene expression at cellular level. However, because of noise and sparsity that exist in scRNA-seq data, analysis of this data has faced to obstacles. Global normalization approach can not resolve correctly missing data that come from technical variability. So this approach cause emerging incorrect bias and dishonest conclusion about cell type. In this study we review some models for scRNA-seq data imputation,explain a new method for filtering genes and clustering data and use matrix completion algorithm for imputation data  

    Drug Synergy Prediction on Diverse Cancer Cell-Lines Using Deep Learning

    , M.Sc. Thesis Sharif University of Technology Labbaf, Farzaneh (Author) ; Hossein Khalaj, Babak (Supervisor)
    Abstract
    Despite significant progress in cancer treatment, drug resistance remains a major challenge. Synergistic drug combinations offer a promising approach to overcome drug resistance and reduce side effects. Still, despite high-throughput testing technologies, existing drug combination databases suffer from biases and a lack of diversity in tested cancer cell lines, which challenges the prediction of drug response on novel cell targets. To address this critical need, we designed a two-level deep learning method that uses large-scale gene expression datasets to estimate the score and synergy of drug compounds on a wide variety of cancer cell lines. Our model includes an auto-encoder that train on... 

    Modeling of Genetic Mutations Associated with Protein Pathway Common in Alzheimer, Parkinson and Macular Degeneration Diseases

    , M.Sc. Thesis Sharif University of Technology Ghahremani, Amin (Author) ; Jahed, Mehran (Supervisor) ; Hossein Khalaj, Babak (Supervisor) ; Shahpasand, Kourosh (Co-Supervisor)
    Abstract
    Extensive studies have been performed on the genetic variations involved in common neurodegenerative diseases such as Alzheimer's, macular degeneration, and Parkinson's. In most cases, no specific gene has been identified pointing to a distinct pathogenic pathway, therefore, this study mainly aims to find common genes among aforementioned diseases according to determination of a specific pathogenic protein pathway.In this study, we reached a deep understanding of the function of nervous system and the discovery of causative agents of the diseases by applying the sources of information from genome datasets in bioinformatics analysis. The utilized database comprises the classification of... 

    Bayesian Filtering Approach to Improve Gene Regulatory Networks Inference Using Gene Expression Time Series

    , M.Sc. Thesis Sharif University of Technology Fouladi, Ramouna (Author) ; Fatemizadeh, Emadoddin (Supervisor) ; Arab, Shahriar (Co-Advisor)
    Abstract
    Gene regulatory modeling in different species is one of the main aims of Bioinformatics. Regarding the limitations of the data available and the perspectives which should be taken into account for modeling such networks, proposed methods up to now have not yet been successful in yielding a comprehensive model. In one of the recent researches, the Gene regulation process is considered as a nonlinear dynamic stochastic process and described by state space equations. Afterwards, in order for the unknown parameters to be estimated, Extended Kalman Filtering is used. In this thesis, first of all, Gene complexes are taken into consideration instead of genes and afterwards, Extended Kalman... 

    Isoform Function Prediction Using Deep Neural Network

    , M.Sc. Thesis Sharif University of Technology Ghazanfari, Sara (Author) ; Motahari, Abolfazl (Supervisor) ; Soleymani, Mahdieh (Supervisor)
    Abstract
    Isoforms are mRNAs that are produced from a same gene site in the phenomenon called Alternative Splicing. Studies have shown that more than 95% of multiexon genes in humans have undergone Alternative Splicing. Although there are few changes in mRNA sequence, They may have a systematic effect on cell function and regulation. It is widely reported that isoforms of a gene have distinct or even contrasting functions. Most studies have shown that alternative splicing plays a significant role in human health and disease. Despite the wide range of gene function studies, there is little information about isoforms’ functionalities. Recently, some computational methods based on Multiple Instance... 

    Human Genome Sequence Analysis Using Statistical and Machine Learning Methods

    , M.Sc. Thesis Sharif University of Technology Alaei, Shervin (Author) ; Manzuri Shalmani, Mohammad Taghi (Supervisor)
    Abstract
    During recent decades, dramatic advances in Genetics and Molecular Biology, has provided scientists with enormous amounts of molecular genomic information of different living organisms, from DNA sequences to complex 3d structures of proteins. This information is raw data which their analysis can provide better understanding of genome mechanisms, discriminating healthy and tumor cells, predicting disease type, making drugs based on genome information, and many more applications. Here, one important issue is the inevitable use of computer science and statistics to analyze these data; such that according to the vast amount of data, would provide intelligent methods, which yield most accurate... 

    Prediction of DNA/RNA Sequence Binding Site to Protein with the Ability to Implement on GPU

    , M.Sc. Thesis Sharif University of Technology Fatemeh Tabatabaei (Author) ; Koohi, Sommaye (Supervisor)
    Abstract
    Based on the importance of DNA/RNA binding proteins in different cellular processes, finding binding sites of them play crucial role in many applications, like designing drug/vaccine, designing protein, and cancer control. Many studies target this issue and try to improve the prediction accuracy with three strategies: complex neural-network structures, various types of inputs, and ML methods to extract input features. But due to the growing volume of sequences, these methods face serious processing challenges. So, this paper presents KDeep, based on CNN-LSTM and the primary form of DNA/RNA sequences as input. As the key feature improving the prediction accuracy, we propose a new encoding... 

    Identifying Cancer-related Genes Via Network Feature Learning and Multi-Omics Data Integration

    , M.Sc. Thesis Sharif University of Technology Safari, Monireh (Author) ; Rabiee, Hamid Reza (Supervisor)
    Abstract
    The highly developed biological data collection methods enable scientists to capture protein-protein interaction (PPI) in the human body, which could be analyzed as biological networks such as protein-protein interaction networks. These networks reveal essential information about the biological process in human cells and can be used to identify genes associated with cancers. Effectively identifying disease-related genes would contribute to improving the treatment and diagnosis of various diseases. Current methods for identifying disease-related genes mainly focus on the hypothesis of guilt-by-association and do not consider the global information in the PPI network. Besides, most methods pay... 

    Analyzing Cancer Cell Identity and Appropriative Subnetworks using Machine Learning

    , M.Sc. Thesis Sharif University of Technology Saberi, Ali (Author) ; Rabiee, Hamid Reza (Supervisor) ; Sharifi Zarchi, Ali (Supervisor)
    Abstract
    From a long time ago cancer has been threatening human’s health, and researchers have been grappling with the phenomenon for numerous years. In the annals of this struggle, the number of cancer victims has outnumbered the survivals in a way that,until recently, suffering from cancer was perceived to be equivalent to death. Permanent defeat against cancer stems from the incomplete recognition of the phenomenon. In recent years, with the advent of technologies to extract information from the heart of cells and at the genome and transcriptome levels, man has been able to acquire a deeper understanding of cancer, its behavior and operation. Now that cancer is regarded to be a genetic disease,... 

    Modelling Cell`s State in Different Cell Types

    , M.Sc. Thesis Sharif University of Technology Saberi, Amir Hossein (Author) ; Hossein Khalaj, Babak (Supervisor) ; Motahari, Abolfazl (Co-Supervisor)
    Abstract
    Existence of heterogeneity in vital tissues of complex multicellular organisms like mammals, and fatal tissues like cancer on one hand, and limited access to biological properties of their components on the other hand, turn the study of these tissue traits to one of the most interesting fields in bioinformatics. One of the hottest subjects in this field is the recognition of functional components of these tissues by using bulk data extracted from the whole tissue.Almost every method that aims to achieve such a purpose, particularly using gene expression data, assumes that all of the cell types which constitute the studied tissue have a deterministic expression profile.In this thesis we... 

    Analysis of Genes Regulating Beta Cells Cell Cycle

    , M.Sc. Thesis Sharif University of Technology Saraei, Tannaz (Author) ; Motahari, Abolfazl (Supervisor)
    Abstract
    Diabetes mellitus is a group of disorders where the level of blood sugar remains high for a long period of time. This increase may be due to either reduced insulin secretion from the pancreatic gland, or insulin resistance, or both. Another key reason is the destruction of beta cells due to functional defect in the body’s immune system. Current treatments include controlling diet, insulin injection and pancreatic transplantation, all of which are temporary. For this reason, finding genetic factors participating in the progression of the disease and adapting treatments to these factors are under intensive studies.In this thesis, available information resources including genomic, biological... 

    Exploration of Existing Patterns in Copy Number Variations of Genetic Diseases and Disorders

    , Ph.D. Dissertation Sharif University of Technology Rahaie, Zahra (Author) ; Rabiee, Hamid Reza (Supervisor)
    Abstract
    One of the main sources of genetic variations are structural variations, including the widespread Copy Number Variations (CNVs). CNVs include two types, copy of genetic material (duplication) and loss of part of genetic sequence (deletion) and typically range from one kilobase pairs (Kbp) to several megabase pairs (Mbp) in size. Most of the copy number variations are occured in in healthy people; however, these variants can also contribute to numerous diseases through several genetic mechanisms (e.g. change gene dosage through insertions, duplications or deletions). The CNV study can provide greater insight into the etiology of disease phenotypes. Nowadays, with the huge amount of investment... 

    Analysis of DNA Methylation in Single-cell Resolution Using Algorithmic Methods and Deep Neural Networks

    , M.Sc. Thesis Sharif University of Technology Rasti Ghamsari, Ozra (Author) ; Sharifi Zarchi, Ali (Supervisor)
    Abstract
    DNA methylation in one of the most important epigenetic variations, which causes significant variations in gene expressions of mammalians. Our current knowledge about DNA methylation is based on measurments from samples of bulk data which cause ambiguity in intracellular differences and analysis of rare cell samples. For this reason, the ability to measure DNA methylation in single-cells has the potential to play an important role in understanding many biological processes including embryonic developement, disease progression including cancer, aging, chromosome instability, X chromosome inactivation, cell differentiation and genes regulation. Recent technological advances have enabled... 

    Identifying Core Genes in Estimation of Missing Gene Expressions

    , M.Sc. Thesis Sharif University of Technology Darvish Shafighi, Shadi (Author) ; Motahari, Abolfazl (Supervisor)
    Abstract
    Characterizing cellular states in response to various disease conditions is an important issue which is addressed by different methods such as Large-scale gene expression profiling. One of the most important challenges in front of bioinformaticians is the loss of data because expression profiling is still very expensive. It is understood that profiling a group of selected genes could be enough for understanding all of the gene expression profile.In this research, we propose a fast method for estimation of the missing values inlow-rank matrices. We consider the highly correlated expression profiles as a low-rank matrix. Then, we used this new method in a proposed algorithm which will select...