Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Gene regulation networks (GRNs) are the bases for virtually all biological processes. To gain a global understanding of GRNs encoded in a genome, we first need to identify in all the cis-regulatory elements (CREs) recognized by transcription factors (TFs). In higher eukaryotes, CREs rarely work alone, instead, they regulate genes by forming combinatorial patterns called cis-regulatory modules (CRMs). Thus finding CREs as well as CRMs is the key to understanding GRNs in eukaryotes. However, identification of CREs and CRMs is a highly challenging task due to their short length and degeneracy while residing in long intergenic or intronic sequences. The recent wide adaptation of chromatin precipitation followed by DNA sequencing (ChIP-seq) techniques has churned out numerous datasets for locating CREs for TFs, providing an unprecedented opportunity to decipher CREs and CRMs in a genome. In this dissertation, we have developed a graph theory based algorithm DePCRM for genome-wide de novo predictions of CRMs and CREs by integrating a large number of ChIP datasets. Using this algorithm, we have predicted 1,108,018 and 5,186,520 CREs, and 115,932 and 807,365 CRMs in the Drosophila melanogaster and human genomes, respectively, using all the ChIP-seq datasets available to us in the two organism. We found that our predicted CRMs could recover more than 80% known CRMs, and that both the putative CREs and CRMs were more conserved than randomly selected sequences in both the genomes. Furthermore, trait-linked SNPs and DNaseI hypersensitive regions are highly enriched in our predicted CRMs in the human genome. Thus, we have provided so far the most comprehensive maps of CREs and CRMs in the two genomes. Using the much larger number of human ChIP datasets, we also analyzed the saturation trends of predicted CRE motifs and their combinatory patterns using an increasing number of randomly selected datasets, datasets in different cell types and datasets for different TFs. We found that the saturation trends started to be notable with only a few datasets in each scenario. The results suggest ways to generate ChIP datasets more cost-effectively in the future. Finally, we analyzed the conservation and variation of the cis-regulatory systems between the two species. We found that although a large portion of CRMs are conserved in their motif composition in the two species, their target genes have been significantly changed. Thus, the majority of the GRNs have been rewired during the evolution from D. melanogaster to humans.

Details

PDF

Statistics

from
to
Export
Download Full History