Integrative analysis of genomic, functional and protein interaction data predicts long-range enhancer-target gene interactions
Christian Rödelsperger1,2,3, Gao Guo3, Mateusz Kolanczyk2, Angelika Pletschacher3, Sebastian Köhler1,3, Sebastian Bauer3, Marcel H. Schulz2,4 and Peter N. Robinson1,2,3,*
+ Author Affiliations
1Berlin-Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, 2Max Planck Institute for Molecular Genetics, 3Institute for Medical Genetics, Charité-Universitätsmedizin, Berlin and 4International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany
↵*To whom correspondence should be addressed. Tel: +49 30 450566042; Fax: +49 30 450569915; Email: firstname.lastname@example.org
Received June 25, 2010.
Revision received October 14, 2010.
Accepted October 14, 2010.
Nucl. Acids Res. (2011) 39 (7): 2492-2502.
Multicellular organismal development is controlled by a complex network of transcription factors, promoters and enhancers. Although reliable computational and experimental methods exist for enhancer detection, prediction of their target genes remains a major challenge. On the basis of available literature and ChIP-seq and ChIP-chip data for enhanceosome factor p300 and the transcriptional regulator Gli3, we found that genomic proximity and conserved synteny predict target genes with a relatively low recall of 12–27% within 2 Mb intervals centered at the enhancers. Here, we show that functional similarities between enhancer binding proteins and their transcriptional targets and proximity in the protein–protein interactome improve prediction of target genes. We used all four features to train random forest classifiers that predict target genes with a recall of 58% in 2 Mb intervals that may contain dozens of genes, representing a better than two-fold improvement over the performance of prediction based on single features alone. Genome-wide ChIP data is still relatively poorly understood, and it remains difficult to assign biological significance to binding events. Our study represents a first step in integrating various genomic features in order to elucidate the genomic network of long-range regulatory interactions.