Loading...

MaxHiC: A robust background correction model to identify biologically relevant chromatin interactions in Hi-C and capture Hi-C experiments

Alinejad Rokny, H ; Sharif University of Technology | 2022

209 Viewed
  1. Type of Document: Article
  2. DOI: 10.1371/journal.pcbi.1010241
  3. Publisher: Public Library of Science , 2022
  4. Abstract:
  5. Hi-C is a genome-wide chromosome conformation capture technology that detects interactions between pairs of genomic regions and exploits higher order chromatin structures. Conceptually Hi-C data counts interaction frequencies between every position in the genome and every other position. Biologically functional interactions are expected to occur more frequently than transient background and artefactual interactions. To identify biologically relevant interactions, several background models that take biases such as distance, GC content and mappability into account have been proposed. Here we introduce MaxHiC, a background correction tool that deals with these complex biases and robustly identifies statistically significant interactions in both Hi-C and capture Hi-C experiments. MaxHiC uses a negative binomial distribution model and a maximum likelihood technique to correct biases in both Hi-C and capture Hi-C libraries. We systematically benchmark MaxHiC against major Hi-C background correction tools including Hi-C significant interaction callers (SIC) and Hi-C loop callers using published Hi-C, capture Hi-C, and Micro-C datasets. Our results demonstrate that 1) Interacting regions identified by MaxHiC have significantly greater levels of overlap with known regulatory features (e.g. active chromatin histone marks, CTCF binding sites, DNase sensitivity) and also disease-associated genome-wide association SNPs than those identified by currently existing models, 2) the pairs of interacting regions are more likely to be linked by eQTL pairs and 3) more likely to link known regulatory features including known functional enhancer-promoter pairs validated by CRISPRi than any of the existing methods. We also demonstrate that interactions between different genomic region types have distinct distance distributions only revealed by MaxHiC. MaxHiC is publicly available as a python package for the analysis of Hi-C, capture Hi-C and Micro-C data. © 2022 Alinejad-Rokny et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
  6. Keywords:
  7. Bioinformatics ; C (programming language) ; Maximum likelihood ; Python ; Background correction ; Background modelling ; Chromatin structure ; Correction models ; Functional interaction ; GC contents ; Genomic regions ; High-order ; Higher-order ; Interaction frequencies ; Binding sites ; Deoxyribonuclease ; Histone ; Transcription factor CTCF ; Binding site ; Binomial distribution ; Biological model ; Biotechnology ; Chromatin structure ; Clustered regularly interspaced short palindromic repeat ; Data analysis ; Enhancer region ; Experimental study ; Expression quantitative trait locus ; Gene mapping ; Genetic database ; Genetic regulation ; Genome-wide association study ; Maximum likelihood method ; Molecular interaction ; Promoter region ; Protein binding ; Single nucleotide polymorphism ; Statistical bias ; Statistical significance ; Validation process ; Chromatin ; Genetics ; Genome ; Genome-wide association study ; Genomics ; Procedures ; Binding Sites ; Chromatin ; Genome ; Genome-Wide Association Study ; Genomics
  8. Source: PLoS Computational Biology ; Volume 18, Issue 6 , 2022 ; 1553734X (ISSN)
  9. URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010241