Automated target gene assignment from ChIP-on-chip data

Target Assignment Overview

Genomic tiling arrays provide an unbiased method to identify new regulatory regions independent of their distance to the gene. While this offers a great advantage over promoter arrays, it raises a new challenge for ChIP-on-chip studies: how to accurately match transcription factor-bound regions to their correct target genes? This has not been an issue in previous studies in yeast as the active enhancer regions are usually within 1kb 5’ of the target gene. Metazoan enhancers have been identified at large distances from their target genes, including within introns of neighboring loci or 3’ to the regulated gene. Assuming that the enhancer is regulating the closest proximal gene will, especially in gene-dense regions, often select the wrong target gene.

We have used different sources of meta-data to systematically link ChIP-enriched regions to their target genes (see figure). The genes in the vicinity of each ChIP-bound region receive a cumulative score based on: 1) the distance between a gene and a bound region, 2) change in expression in loss of function mutant embryos for that transcription factor and 3) supporting information, for example about the gene’s expression patterns (BDGP in-situ database, Flybase, literature). In order to obtain an accurate view of the transcription factors target genes, genes were not assigned based on proximity alone.

The gene assignment algorithm was done in collaboration with Lars Jensen and Peer Bork, EMBL Heidelberg.

We estimated the accuracy of our automated gene assignment with the Mef2-ChIP data set using a reference set of characterized enhancers from single gene studies (available from FlyReg and RedFly). 85% of the genes whose enhancer were known were correctly assigned using this method. The remaining enhancer regions that we assigned to the wrong gene occurred in a genomic region that has a very high gene density, the enhancer of split region. In this case the Mef2-bound regions were assigned to a different member of this gene cluster.