The Drosophila transcription factor (TF) Zfh1 has distinct roles compared to the cell lineage-determining TFs in almost all mesoderm-derived tissues. Here, we link Zfh1 to the well-characterized mesodermal transcriptional network. We identify five enhancers integrating upstream regulatory inputs from mesodermal TFs and directing zfh1 expression in mesoderm. Most downstream Zfh1-target genes are co-bound by mesodermal TFs, suggesting that Zfh1 and mesodermal TFs act on the same sets of co-regulated genes during the development of certain mesodermal tissues. Furthermore, we demonstrate that Zfh1 is critical for the expression of a hemocyte marker gene peroxidasin and helps restrict the activity of a hemocyte-specific enhancer of serpent to hemocyte-deriving head mesoderm, suggesting a potential role of Zfh1 in hemocyte development.
Non-coding RNA expression, function and variation during Drosophila embryogenesis
Schor IE, Bussotti G, Males M, Forneris M, Viales RR, Enright AJ, Furlong EEM (2018)
Current Biology 2018 Oct. doi: 10.1016/j.cub.2018.09.026 | Current Biology | doi
Long non-coding RNAs (lncRNAs) can often function in the regulation of gene expression during development; however, their generality as essential regulators in developmental processes and organismal phenotypes remains unclear. Here, we performed a tailored investigation of lncRNA expression and function during Drosophila embryogenesis, interrogating multiple stages, tissue specificity, nuclear localization, and genetic backgrounds. Our results almost double the number of annotated lncRNAs expressed at these embryonic stages. lncRNA levels are generally positively correlated with those of their neighboring genes, with little evidence of transcriptional interference. Using fluorescent in situ hybridization, we report the spatiotemporal expression of 15 new lncRNAs, revealing very dynamic tissue-specific patterns. Despite this, deletion of selected lncRNA genes had no obvious developmental defects or effects on viability under standard and stressed conditions. However, two lncRNA deletions resulted in modest expression changes of a small number of genes, suggesting that they fine-tune expression of non-essential genes. Several lncRNAs have strain-specific expression, indicating that they are not fixed within the population. This intra-species variation across genetic backgrounds may thereby be a useful tool to distinguish rapidly evolving lncRNAs with as yet non-essential roles.
All coordinates are relative to the dm6 assembly
We provide bed files for novel lncRNA:
Also provided are the gene expression counts:
Raw data are available at ArrayExpress and ENA:
Developmental enhancers and chromosome topology.
Furlong EEM‡, Levine M‡ (2018)
Science 2018 Sep. doi: 10.1126/science.aau0320 | Science | doi
‡ Co-corresponding authors
Developmental enhancers mediate on/off patterns of gene expression in specific cell types at particular stages during metazoan embryogenesis. They typically integrate multiple signals and regulatory determinants to achieve precise spatiotemporal expression. Such enhancers can map quite far—one megabase or more—from the genes they regulate. How remote enhancers relay regulatory information to their target promoters is one of the central mysteries of genome organization and function. A variety of contrasting mechanisms have been proposed over the years, including enhancer tracking, linking, looping, and mobilization to transcription factories. We argue that extreme versions of these mechanisms cannot account for the transcriptional dynamics and precision seen in living cells, tissues, and embryos. We describe emerging evidence for dynamic three-dimensional hubs that combine different elements of the classical models.
The Insulator Protein CTCF Is Required for Correct Hox Gene Expression, but Not for Embryonic Development in Drosophila.
Gambetta MC‡, Furlong EEM‡ (2018)
Genetics 2018 Jul 18. doi: 10.1534/genetics.118.301350 | Europe PMC | doi
‡ Co-corresponding authors
Insulator binding proteins (IBPs) play an important role in regulating gene expression by binding to specific DNA sites to facilitate appropriate gene regulation. There are several IBPs in Drosophila, each defined by their ability to insulate target gene promoters in transgenic assays from the activating or silencing effects of neighboring regulatory elements. Of these, only CCCTC-binding factor (CTCF) has an obvious ortholog in Mammals. CTCF is essential for mammalian cell viability and is an important regulator of genome architecture. In flies, CTCF is both maternally deposited and zygotically expressed. Flies lacking zygotic CTCF die as young adults with homeotic defects, suggesting that specific Hox genes are misexpressed in inappropriate body segments. The lack of any major embryonic defects was assumed to be due to the maternal supply of CTCF protein, as maternally contributed factors are often sufficient to progress through much of embryogenesis. Here, we definitively determined the requirement of CTCF for developmental progression in Drosophila We generated animals that completely lack both maternal and zygotic CTCF and found that, contrary to expectation, these mutants progress through embryogenesis and larval life. They develop to pharate adults, which fail to eclose from their pupal case. These mutants show exacerbated homeotic defects compared to zygotic mutants, misexpressing the Hox gene Abdominal-B outside of its normal expression domain early in development. Our results indicate that loss of Drosophila CTCF is not accompanied by widespread effects on gene expression, which may be due to redundant functions with other IBPs. Rather, CTCF is required for correct Hox gene expression patterns and for the viability of adult Drosophila.
A Versatile, Low-Cost, Multiway Microfluidic Sorter for Droplets, Cells, and Embryos.
Utharala R, Tseng Q, Furlong EEM, Merten CA (2018)
Analytical Chem. 2018 May 15; 90(10):5982-5988. doi: 10.1021/acs.analchem.7b04689. | Europe PMC | doi
Partitioning and sorting particles, including molecules, cells and organisms, is an essential prerequisite for a diverse range of applications. Here, we describe a very economical microfluidic platform (built from parts costing about U.S. $6800 for a stand-alone system or U.S. $3700, when mounted on an existing fluorescence microscope connected to a computer) to sort droplets, cells and embryos, based on imaging data. Valves operated by a Braille display are used to open and close microfluidic channels, enabling sorting at rates of >2 Hz. Furthermore, we show microfluidic 8-way sorting for the first time, facilitating the simultaneous separation and collection of objects with diverse characteristics/phenotypes. Due to the high flexibility in the size of objects that can be sorted, the low cost, and the many possibilities enabled by imaging technology, we believe that our approach nicely complements existing FACS and μFACS technology.
The cis-regulatory dynamics of embryonic development at single-cell resolution
Cusanovich DA*, Reddington JP*, Garfield DA*, Daza RM, Aghamirzaie D, Marco-Ferreres R, Pliner HA, Christiansen L, Qiu X, Steemers FJ, Trapnell C, Shendure J‡, Furlong EE‡ (2018)
Nature Mar 14 [Epub ahead of print]. doi: 10.1038/nature25981 | Europe PMC | doi
* These authors contributed equally to this work.
‡ Co-corresponding authors
Understanding how gene regulatory networks control the progressive restriction of cell fates is a long-standing challenge. Recent advances in measuring gene expression in single cells are providing new insights into lineage commitment. However, the regulatory events underlying these changes remain unclear. Here we investigate the dynamics of chromatin regulatory landscapes during embryogenesis at single-cell resolution. Using single-cell combinatorial indexing assay for transposase accessible chromatin with sequencing (sci-ATAC-seq), we profiled chromatin accessibility in over 20,000 single nuclei from fixed Drosophila melanogaster embryos spanning three landmark embryonic stages: 2-4 h after egg laying (predominantly stage 5 blastoderm nuclei), when each embryo comprises around 6,000 multipotent cells; 6-8 h after egg laying (predominantly stage 10-11), to capture a midpoint in embryonic development when major lineages in the mesoderm and ectoderm are specified; and 10-12 h after egg laying (predominantly stage 13), when each of the embryo’s more than 20,000 cells are undergoing terminal differentiation. Our results show that there is spatial heterogeneity in the accessibility of the regulatory genome before gastrulation, a feature that aligns with future cell fate, and that nuclei can be temporally ordered along developmental trajectories. During mid-embryogenesis, tissue granularity emerges such that individual cell types can be inferred by their chromatin accessibility while maintaining a signature of their germ layer of origin. Analysis of the data reveals overlapping usage of regulatory elements between cells of the endoderm and non-myogenic mesoderm, suggesting a common developmental program that is reminiscent of the mesendoderm lineage in other species. We identify 30,075 distal regulatory elements that exhibit tissue-specific accessibility. We validated the germ-layer specificity of a subset of these predicted enhancers in transgenic embryos, achieving an accuracy of 90%. Overall, our results demonstrate the power of shotgun single-cell profiling of embryos to resolve dynamic changes in the chromatin landscape during development, and to uncover the cis-regulatory programs of metazoan germ layers and cell types.
All coordinates are relative to the dm3 assembly
A companion shiny application has been released with this publication to browse the data
Site-by-cell matrices and vignettes to facilitate further exploration of the data
Summary of summits of accessibility (see legend to supplemtary table 1, in paper, for explanation of columns)
Raw data is available at ArrayExpress and GEO:
- Accession E-MTAB-5999 — DNase-Seq in Drosophila melanogaster embryonic tissues 6-8 after egg laying
- Accession GSE1081 — sci-ATAC-Seq in Drosophila melanogaster embryonic tissues at 2-4h, 6-8h and 10-12h after egg laying
The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription
Mikhaylichenko O*, Bondarenko V*, Harnett D, Schor IE, Males M, Viales RR, Furlong EE (2018)
Genes Dev. 32(1):42-57. doi: 10.1101/gad.292870.116 | Europe PMC | doi
* These authors contributed equally to this work.
See also genesdev.org/enhancer-transcription_what-where-when-and-why and nature.org/gene-expression_developmental-enhancers-in-action
Gene expression is regulated by promoters, which initiate transcription, and enhancers, which control their temporal and spatial activity. However, the discovery that mammalian enhancers also initiate transcription questions the inherent differences between enhancers and promoters. Here, we investigate the transcriptional properties of enhancers during Drosophila embryogenesis using characterized developmental enhancers. We show that while the timing of enhancer transcription is generally correlated with enhancer activity, the levels and directionality of transcription are highly varied among active enhancers. To assess how this impacts function, we developed a dual transgenic assay to simultaneously measure enhancer and promoter activities from a single element in the same embryo. Extensive transgenic analysis revealed a relationship between the direction of endogenous transcription and the ability to function as an enhancer or promoter in vivo, although enhancer RNA (eRNA) production and activity are not always strictly coupled. Some enhancers (mainly bidirectional) can act as weak promoters, producing overlapping spatio–temporal expression. Conversely, bidirectional promoters often act as strong enhancers, while unidirectional promoters generally cannot. The balance between enhancer and promoter activity is generally reflected in the levels and directionality of eRNA transcription and is likely an inherent sequence property of the elements themselves.
All coordinates are relative to the dm3 assembly
We provide the
whole embryo PRO-cap strand specific signal at 3-4h and 6-8h after egg laying (AEL) and the
strand specific mesoderm CAGE-seq signal data at 6-8h AEL
Raw data is available at ArrayExpress:
- Accession E-MTAB-6154 — PRO-cap-Seq at two embryonic time points during embryogenesis in Drosophila melanogaster
- Accession E-MTAB-6159 — CAGE-Seq during Drosophila melanogaster embryogenesis at 6-8h after egg laying in mesoderm.
Opbp is a new architectural/insulator protein required for ribosomal gene expression.
Zolotarev N, Maksimenko O, Kyrchanova O, Sokolinskaya E, Osadchiy I, Girardot C, Bonchuk A, Ciglar L, Furlong EE‡, Georgiev P‡ (2017)
Nucleic Acids Res. 6, Aug. 2017. doi: 10.1093/nar/gkx840 | Europe PMC | doi
‡ Co-corresponding authors
A special class of poorly characterized architectural proteins is required for chromatin topology and enhancer-promoter interactions. Here, we identify Opbp as a new Drosophila architectural protein, interacting with CP190 both in vivo and in vitro. Opbp binds to a very restrictive set of genomic regions, through a rare sequence specific motif. These sites are co-bound by CP190 in vivo, and generally located at bidirectional promoters of ribosomal protein genes. We show that Opbp is essential for viability, and loss of Opbp function, or destruction of its motif, leads to reduced ribosomal protein gene expression, indicating a functional role in promoter activation. As characteristic of architectural/insulator proteins, the Opbp motif is sufficient for distance-dependent reporter gene activation and enhancer-blocking activity, suggesting an Opbp-mediated enhancer-promoter interaction. Rather than having a constitutive role, Opbp represents a new type of architectural protein with a very restricted, yet essential, function in regulation of housekeeping gene expression.
All coordinates are relative to the dm3 assembly
We provide the
ChIP-seq signal of the Opbp architectural/insulator protein occupancy in whole embryo during embryonic development at 0-12h after egg-laying.
Uncoupling evolutionary changes in DNA sequence, transcription factor occupancy and enhancer activity
Khoueiry P, Girardot C, Ciglar L, Peng PC, Gustafson EH, Sinha S, Furlong EE (2017)
Elife 6, Aug. 2017. doi: 10.7554/eLife.28440 | Europe PMC | doi
Sequence variation within enhancers plays a major role in both evolution and disease, yet its functional impact on transcription factor (TF) occupancy and enhancer activity remains poorly understood. Here, we assayed the binding of five essential TFs over multiple stages of embryogenesis in two distant Drosophila species (with 1.4 substitutions per neutral site), identifying thousands of orthologous enhancers with conserved or diverged combinatorial occupancy. We used these binding signatures to dissect two properties of developmental enhancers: (1) potential TF cooperativity, using signatures of co-associations and co-divergence in TF occupancy. This revealed conserved combinatorial binding despite sequence divergence, suggesting protein-protein interactions sustain conserved collective occupancy. (2) Enhancer in-vivo activity, revealing orthologous enhancers with conserved activity despite divergence in TF occupancy. Taken together, we identify enhancers with diverged motifs yet conserved occupancy and others with diverged occupancy yet conserved activity, emphasising the need to functionally measure the effect of divergence on enhancer activity.
All coordinates are relative to the drovir3 assembly unless specified otherwise
We provide the ChIP-seq signal and peaks data for five of the essential developmental transcription factors (TF) (Twist [Twi], Myocyte Enhancer Factor-2 [Mef2], Tinman [Tin], Bagpipe [Bap] and Biniou [Bin]) involved at multiple stages of Drosophila virilis embryogenesis (spanning from 2 to 17h after egg laying).
In addition we provide the D. virilis mesodermal ChIP-CRMs coordinates derived from this ChIP-seq data (peaks within 400 bp were clustered and CRMs called. 100 bp was added to CRM boundaries to extend them). Coordinates are provided both for drovir3 and dm3. The dm3 coordinates are a lift over of drovir3 coordinates to the dm3 assembly.
The ChIP-seq data download links:
- ChIP-seq signal, i.e. Read Per Million and input background subtracted signal for each TF and its related timepoints, as bigwig files.
- ChIP-seq peaks, i.e. IDR 1% optimal MACS2 peaks set for each TF and its related timepoints, as bed files.
The mesodermal TF-CRMs coordinates download links:
The raw data will be available soon at ArrayExpress:
- Accession E-MTAB-3798 — ChIP-seq for 5 essential TF (Twi, Mef2, Tin, Bap and Bin) spanning D. virilis’ development
Chromosome topology guides the Drosophila Dosage Compensation Complex for target gene activation
Schauer T, Ghavi-Helm Y, Sexton T, Albig C, Regnard C, Cavalli G, Furlong EE, Becker PB (2017)
EMBO Rep e201744292 , Aug. 2017. doi: 10.15252/embr.201744292 | Europe PMC | doi
X chromosome dosage compensation in Drosophila requires chromosome-wide coordination of gene activation. The male-specific lethal dosage compensation complex (DCC) identifies and binds to X-chromosomal high-affinity sites (HAS) from which it boosts transcription. A sub-class of HAS, PionX sites, represent first contacts on the X. Here, we explored the chromosomal interactions of representative PionX sites by high-resolution 4C and determined the global chromosome conformation by Hi-C in sex-sorted embryos. Male and female X chromosomes display similar nuclear architecture, concordant with clustered, constitutively active genes. PionX sites, like HAS, are evenly distributed in the active compartment and engage in short- and long-range interactions beyond compartment boundaries. Long-range, inter-domain interactions between DCC binding sites are stronger in males, suggesting that the complex refines chromatin organization. By de novo induction of DCC in female cells, we monitored the extent of activation surrounding PionX sites. This revealed a remarkable range of DCC action not only in linear proximity, but also at megabase distance if close in space, suggesting that DCC profits from pre-existing chromosome folding to activate genes.
Correlation Does Not Imply Causation: Histone Methyltransferases, but Not Histone Methylation, SET the Stage for Enhancer Activation
Pollex T, Furlong EE (2017)
Mol Cell 66(4):439-441. doi: 10.1016/j.molcel.2017.05.005 | Europe PMC | doi
Although H3K4me1 is a pervasive “mark” of enhancers, its functional requirement for enhancer activity remains unclear. In this issue of Molecular Cell, Dorighi et al. (2017) show that in some contexts, the methyltransferase complex, rather than the H3K4me1 mark, is required for gene expression.
Dual functionality of cis-regulatory elements as developmental enhancers and Polycomb response elements
Erceg J*, Pakozdi T*, Marco-Ferreres R*, Ghavi-Helm Y, Girardot C, Bracken AP, Furlong EE (2017)
Genes Dev. 31(6):590-602. doi: 10.1101/gad.292870.116 | Europe PMC | doi
* These authors contributed equally to this work.
See also genesdev.org/multitasking-by-polycomb-response-elements
Developmental gene expression is tightly regulated through enhancer elements, which initiate dynamic spatio-temporal expression, and Polycomb response elements (PREs), which maintain stable gene silencing. These two cis-regulatory functions are thought to operate through distinct dedicated elements. By examining the occupancy of the Drosophila pleiohomeotic repressive complex (PhoRC) during embryogenesis, we revealed extensive co-occupancy at developmental enhancers. Using an established in vivo assay for PRE activity, we demonstrated that a subset of characterized developmental enhancers can function as PREs, silencing transcription in a Polycomb-dependent manner. Conversely, some classic Drosophila PREs can function as developmental enhancers in vivo, activating spatio-temporal expression. This study therefore uncovers elements with dual function: activating transcription in some cells (enhancers) while stably maintaining transcriptional silencing in others (PREs). Given that enhancers initiate spatio-temporal gene expression, reuse of the same elements by the Polycomb group (PcG) system may help fine-tune gene expression and ensure the timely maintenance of cell identities.
We provide the BiTS ChIP-seq signal as bigwig files for both components of the Drosophila pleiohomeotic repressive complex (PhoRC):
Pho and
dSfmbt. We also provide the
H3K27me3 histone mark signal – this is the same signal that was made available along with Bonn et al. 2012. All signal files are provided both for dm3 and dm6 build versions.
The raw data is available at EBI (ArrayExpress and ENA):
Dynamix: dynamic visualization by automatic selection of informative tracks from hundreds of genomic datasets
Monfort M, Furlong EE, Girardot C (2017)
Bioinformatics 2017 Mar. 11. doi: 10.1093/bioinformatics/btx141 | Europe PMC | doi [Epub ahead of print]
Motivation:
Visualization of genomic data is fundamental for gaining insights into genome function. Yet, co-visualization of a large number of datasets remains a challenge in all popular genome browsers and the development of new visualization methods is needed to improve the usability and user experience of genome browsers.
Results:
We present Dynamix, a JBrowse plugin that enables the parallel inspection of hundreds of genomic datasets. Dynamix takes advantage of
a priori knowledge to automatically display data tracks with signal within a genomic region of interest. As the user navigates through the genome, Dynamix automatically updates data tracks and limits all manual operations otherwise needed to adjust the data visible on screen. Dynamix also introduces a new carousel view that optimizes screen utilization by enabling users to independently scroll through groups of tracks.
Availability and Implementation:
Dynamix is hosted at
http://furlonglab.embl.de/Dynamix.
Contact: charles.girardot@embl.de
Get code and support
Hands on Dynamix
Promoter shape varies across populations and impacts promoter evolution and expression noise
Schor IE, Degner JF, Harnett D, Cannavo E, Casale FP, Shim H, Garfield D, Birney E, Stephens M, Stegle O‡, Furlong EE‡ (2017)
Nature Genetics 2017 49(4):550-558. doi: 10.1038/ng.3791 | Europe PMC | doi
‡ Co-corresponding authors
Metazoan promoters initiate transcription either at precise positions (narrow promoters) or dispersed regions (broad promoters), a feature called promoter shape. Although highly conserved, the functional properties of promoters with different shapes and the genetic determinants of their evolution remain unclear. Here, we used natural genetic variation cross a panel of 81 Drosophila lines, to measure changes in transcriptional start site (TSS) usage, identifying thousands of genetic variants affecting transcript levels (strength) or the distribution of TSS within a promoter (shape). Our results identify promoter shape as a molecular trait, evolvable independently of promoter strength. Broad promoters typically harbor shape-associated variants, with signatures of adaptive selection. Single-cell measurements reveal that variants modulating promoter shape often increase expression noise, while heteroallelic interactions with other promoter variants alleviate these effects. These results uncover new functional properties of natural promoters, and suggest the minimization of expression noise as an important factor influencing promoter evolution.
All coordinates are relative to the dm3 assembly
We provide the Supplementary Tables and convenient media produced for Schor
et al. 2017. Please refer to our Supplementary Tables description material
[
/
]
Importantly, the Furlong lab hosts an interactive
HTML table containing
relevant plots and information about all significant associations found with the joint model. Detailed information for each region is accessible by clicking on the relevant location hyperlink in the
Window column. (Access the
detailed description of the table).
Additionally, we provide tables (tab-delimited files) for each window with the raw p-values for all variants. We included all windows with a significant association for the multi-trait 3PC (all time-points together, first 3 principal components as phenotype),
i.e. all those included in Supplementary Table 2. We provide two compressed files, which contain all tables:
- 3PC: p-values resulting from the multi-trait 3PC model (all time-points together, first 3 principal components as phenotype)
- Mean: p-values resulting from the multi-trait Mean model for the same windows as with the 3PC model (all time-points together, mean CAGE signal as phenotype)
Both tables contain, besides chromosome and position of all tested variants, the p values for the common effect and the time-specific terms.
Supplementary Tables:
Raw data is available at ArrayExpress:
Genetic variants regulating expression levels and isoform diversity during embryogenesis
Cannavò E*, Koelling N*, Harnett D, Garfield D, Casale FP, Ciglar L, Gustafson HE, Viales RR, Marco-Ferreres R, Degner JF, Zhao B, Stegle O, Birney E‡, Furlong EE‡ (2016)
Nature 541(7637):402-406. doi: 10.1038/nature20802 | Europe PMC | doi
* These authors contributed equally to this work.
‡ Co-corresponding authors
Embryonic development is driven by tightly regulated patterns of gene expression, despite extensive genetic variation among individuals. Studies of expression quantitative trait loci (eQTL) indicate that genetic variation frequently alters gene expression in cell-culture models and differentiated tissues. However, the extent and types of genetic variation impacting embryonic gene expression, and their interactions with developmental programs, remain largely unknown. Here we assessed the effect of genetic variation on transcriptional (expression levels) and post-transcriptional (3′ RNA processing) regulation across multiple stages of metazoan development, using 80 inbred Drosophila wild isolates, identifying thousands of developmental-stage-specific and shared QTL. Given the small blocks of linkage disequilibrium in Drosophila, we obtain near base-pair resolution, resolving causal mutations in developmental enhancers, validated transcription-factor-binding sites and RNA motifs. This fine-grain mapping uncovered extensive allelic interactions within enhancers that have opposite effects, thereby buffering their impact on enhancer activity. QTL affecting 3′ RNA processing identify new functional motifs leading to transcript isoform diversity and changes in the lengths of 3′ untranslated regions. These results highlight how developmental stage influences the effects of genetic variation and uncover multiple mechanisms that regulate and buffer expression variation during embryogenesis.
We provide the matrix of 3′ Tag-Seq read counts for all 10,536 measured genes in all 254 samples (including replicates). We also provide the data used to perform our QTL testing with LIMIX.
- 3′ TagSeq raw read counts genes all 254 samples. This is the matrix of 3′ Tag-Seq read counts for all 10,536 measured genes in all 254 samples (including replicates), covering 80 DGRP lines and three stages of embryogenesis. It gives the per-gene read counts calculated as the sum of summit read counts in all 3′ Tag-Seq regions uniquely associated with the gene (Flybase annotation v5.47)
- Data for QTL testing with LIMIX. This is input and output files from the gene-based (gene eQTL) and peak-based (3iQTL) QTL testing in LIMIX (HDF5 format). It includes genotype and phenotype matrices as well as p-values and betas for all tested variants and all tests. See readme for details.
Raw data is available at ArrayExpress:
- Accession E-MTAB-4722 — 3′-Tag-Seq at three embryonic time points during embryogenesis in 82 Drosophila melanogaster lines from from the Drosophila Genetic reference Panel (DGRP)
- Accession E-MTAB-4723 — RNA-seq of coding RNA from 22 strains of Drosophila melanogaster from the DGRP at embryonic time point 10-12 hours after egg laying
- Accession E-MTAB-4694 — RNA-seq of coding RNA in F1 crosses of two pairs of DGRP lines at three embryonic time points during embryogenesis
Identification and in silico modeling of enhancers reveals new features of the cardiac differentiation network
Seyres D, Ghavi-Helm Y*, Junion G*, Taghli-Lamallem O, Guichard C, Röder L, Girardot C, Furlong EE‡ and Perrin L‡ (2016)
Development 143(23):4533-4542. doi: 10.1242/dev.140822 | Europe PMC | doi
* These authors contributed equally to this work.
‡ Co-corresponding authors
Developmental patterning and tissue formation are regulated through complex gene regulatory networks (GRNs) driven through the action of transcription factors (TFs) converging on enhancer elements. Here, as a point of entry to dissect the poorly defined GRN underlying cardiomyocyte differentiation, we apply an integrated approach to identify active enhancers and TFs involved in Drosophila heart development. The Drosophila heart consists of 104 cardiomyocytes, representing less than 0.5% of all cells in the embryo. By modifying BiTS-ChIP for rare cells, we examined H3K4me3 and H3K27ac chromatin landscapes to identify active promoters and enhancers specifically in cardiomyocytes. These in vivo data were complemented by a machine learning approach and extensive in vivo validation in transgenic embryos, which identified many new heart enhancers and their associated TF motifs. Our results implicate many new TFs in late stages of heart development, including Bagpipe, an Nkx3.2 ortholog, which we show is essential for differentiated heart function.
All coordinates are relative to the dm3 assembly
We provide the BiTS ChIP-seq signal as bigwig files, in three formats (50bp resolution (as in the publication screenshots), recomputed at a higher 1bp resolution, and the optimal peak set from the IDR analysis) for two chromatin modifications (H3K27ac, H3K4me3), in BiTS purified cardiomyocyte nuclei from Drosophila embryos at 10-13h after egg-laying (AEL). We also provide RNA-seq signal (two independent replicates), of FACS purified cardiomyocytes from Drosophila embryos at 10-13h AEL:
Raw data is available on the Sequence Read Archive and at ArrayExpress:
Chromatin Immunoprecipitation for Analyzing Transcription Factor Binding and Histone Modifications in Drosophila
Ghavi-Helm Y, Zhao B, Furlong EE (2016)
Methods Mol. Biol. 1478:263-277. doi: 10.1007/978-1-4939-6371-3_16 | Europe PMC | doi
Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is an invaluable technique to assess
transcription factor binding and
histone modifications in a genome-wide manner, an essential step towards understanding the mechanisms that govern embryonic development.
Here, we provide a detailed protocol for all steps involved in generating a ChIP-seq library, starting from embryo collection, fixation,
chromatin preparation, immunoprecipitation, and finally library preparation.
The protocol is optimized for
Drosophila embryos, but can be easily adapted for any model organism.
The resulting library is suitable for sequencing on an Illumina HiSeq or MiSeq platform.
Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers.
Girardot C, Scholtalbers J, Sauer S, Su SY, Furlong EE (2016)
BMC Bioinformatics 17(1) doi: 10.1186/s12859-016-1284-2 | Europe PMC | doi
The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice.
While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-
cell RNA sequencing (scRNA-seq).
In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI.
Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing.
Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs.Existing tools do not support these complex barcoding configurations and custom code development is frequently required.
Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account.
Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36%, compared to when UMIs are ignored.Je is implemented in JAVA and uses the Picard API.
Code, executables and documentation are freely available at http://gbcs.embl.de/Je .
Je can also be easily installed in Galaxy through the Galaxy toolshed.
Je is a suite to handle barcoded fastq files with (or without) Unique Molecule Identifiers (UMIs) and filter read duplicates using these UMIs
Qualitative Dynamical Modelling Can Formally Explain Mesoderm Specification and Predict Novel Developmental Phenotypes.
Mbodj A, Gustafson EH, Ciglar L, Junion G, Gonzalez A, Girardot C, Perrin L, Furlong EE‡, Thieffry D‡ (2016)
PLoS Comput. Biol. 12(9) doi: 10.1371/journal.pcbi.1005073 | Europe PMC | doi
‡ Co-corresponding authors
Given the complexity of developmental networks, it is often difficult to predict the effect of genetic perturbations, even within coding genes.
Regulatory factors generally have pleiotropic effects, exhibit partially redundant roles, and regulate highly interconnected pathways with ample cross-talk.
Here, we delineate a logical model encompassing 48 components and 82 regulatory interactions involved in mesoderm specification during
Drosophila development, thereby providing a formal integration of all available genetic information from the literature.
The four main tissues derived from mesoderm correspond to alternative stable states.
We demonstrate that the model can predict known mutant phenotypes and use it to systematically predict the effects of over 300 new, often non-intuitive, loss- and gain-of-function mutations, and combinations thereof.
We further validated several novel predictions experimentally, thereby demonstrating the robustness of model.
Logical modelling can thus contribute to formally explain and predict regulatory outcomes underlying
cell fate decisions.
Next-generation sequencing-based detection of germline L1-mediated transductions
Tica J*, Lee E*, Untergasser A, Meiers S, Garfield DA, Gokcumen O, Furlong EE, Park PJ, Stütz AM*,‡, Korbel JO*,‡ (2016)
BMC Genomics 17(1) doi: 10.1186/s12864-016-2670-x | Europe PMC | doi
* These authors contributed equally to this work.
‡ Co-corresponding authors
While active LINE-1 (L1) elements possess the ability to mobilize flanking sequences to different genomic loci through a process termed
transduction influencing genomic content and structure, an approach for detecting polymorphic germline non-reference
transductions in massively-parallel sequencing data has been lacking.Here we present the computational approach TIGER (
Transduction Inference in GERmline genomes), enabling the discovery of non-reference L1-mediated
transductions by combining L1 discovery with detection of unique insertion sequences and detailed characterization of insertion sites. We employed TIGER to characterize polymorphic
transductions in fifteen genomes from non-
human primate species (
chimpanzee,
orangutan and
rhesus macaque), as well as in a
human genome. We achieved high accuracy as confirmed by PCR and two single molecule DNA sequencing techniques, and uncovered differences in relative rates of
transductionbetween
primate species.By enabling detection of polymorphic
transductions, TIGER makes this form of relevant structural variation amenable for population and personal genome analysis.
Shadow Enhancers Are Pervasive Features of Developmental Regulatory Networks
Cannavò E, Khoueiry P, Garfield DA, Geeleher P, Zichner T, Gustafson EH, Ciglar L, Korbel JO, Furlong EE (2016)
Curr. Biol. 26(1):38-51. doi: 10.1016/j.cub.2015.11.034 | Europe PMC | doi
Embryogenesis is remarkably robust to segregating mutations and environmental variation; under a range of conditions, embryos of a given species develop into stereotypically patterned organisms. Such robustness is thought to be conferred, in part, through elements within regulatory networks that perform similar, redundant tasks. Redundant enhancers (or “shadow” enhancers), for example, can confer precision and robustness to
gene expression, at least at individual, well-studied loci. However, the extent to which enhancer redundancy exists and can thereby have a major impact on developmental robustness remains unknown. Here, we systematically assessed this, identifying over 1,000 predicted shadow enhancers during
Drosophila mesoderm development. The activity of 23 elements, associated with five genes, was examined in transgenic embryos, while natural structural variation among individuals was used to assess their ability to buffer against genetic variation. Our results reveal three clear properties of enhancer redundancy within developmental systems. First, it is much more pervasive than previously anticipated, with 64% of loci examined having shadow enhancers. Their spatial redundancy is often partial in nature, while the non-overlapping function may explain why these enhancers are maintained within a population. Second, over 70% of loci do not follow the simple situation of having only two shadow enhancers-often there are three (
rols), four (
CadN and
ade5), or five (
Traf1), at least one of which can be deleted with no obvious phenotypic effects. Third, although shadow enhancers can buffer variation, patterns of segregating variation suggest that they play a more complex role in development than generally considered.
FourCSeq: Analysis of 4C sequencing data
Klein FA, Pakozdi T, Anders S, Ghavi-Helm Y, Furlong EE, Huber W (2015)
Bioinformatics 31(19):3085-3091. doi: 10.1093/bioinformatics/btv335 | Europe PMC | doi
Circularized
Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ‘viewpoint’ with the rest of the genome, both in a single condition or comparing different experimental conditions or
cell types. Observed ligation frequencies typically show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific peaks and to detect changes between different biological conditions.We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonically decreasing function to suitably transformed count data. Based on the fit, z-scores are calculated from the residuals, and high z-scores are interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statistical method DESEQ2: adapted from RNA-Seq analysis.A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at
www.bioconductor.org.
felix.klein@embl.de or
whuber@embl.de. Supplementary data are available at Bioinformatics online.
Conservation of transcription factor binding specificities across 600 million years of bilateria evolution
Nitta KR, Jolma A, Yin Y, Morgunova E, Kivioja T, Akhtar J, Hens K, Toivonen J, Deplancke B, Furlong EE, Taipale J (2015)
Elife 4 doi: 10.7554/elife.04837 | Europe PMC | doi
Divergent morphology of species has largely been ascribed to genetic differences in the tissue-specific expression of
proteins, which could be achieved by divergence in cis-regulatory elements or by altering the
binding specificity of transcription factors (TFs). The relative importance of the latter has been difficult to assess, as previous systematic analyses of TF
binding specificity have been performed using different methods in different species. To address this, we determined the
binding specificities of 242
Drosophila TFs, and compared them to
human and
mouse data. This analysis revealed that TF
binding specificities are highly conserved between
Drosophila and
mammals, and that for orthologous TFs, the similarity extends even to the level of very subtle
dinucleotide binding preferences. The few
human TFs with divergent specificities function in
cell types not found in
fruit flies, suggesting that evolution of TF specificities contributes to emergence of novel types of differentiated
cells.
Ultrasensitive proteome analysis using paramagnetic bead technology
Hughes CS, Foehr S, Garfield DA, Furlong EE, Steinmetz LM, Krijgsveld J (2014)
Mol. Syst. Biol. 10(10) doi: 10.15252/msb.20145625 | Europe PMC | doi
In order to obtain a systems-level understanding of a complex biological system, detailed proteome information is essential. Despite great progress in proteomics technologies, thorough interrogation of the proteome from quantity-limited biological samples is hampered by inefficiencies during processing. To address these challenges, here we introduce a novel protocol using paramagnetic beads, termed Single-Pot Solid-Phase-enhanced Sample Preparation (
SP3).
SP3 provides a rapid and unbiased means of proteomic sample preparation in a single tube that facilitates ultrasensitive analysis by outperforming existing protocols in terms of efficiency, scalability, speed, throughput, and flexibility. To illustrate these benefits, characterization of 1,000 HeLa
cells and single
Drosophila embryos is used to establish that
SP3 provides an enhanced platform for profiling proteomes derived from sub-microgram amounts of material. These data present a first view of developmental stage-specific proteome dynamics in
Drosophila at a single-embryo resolution, permitting characterization of inter-individual expression variation. Together, the findings of this work position
SP3 as a superior protocol that facilitates exciting new directions in multiple areas of proteomics ranging from developmental biology to clinical applications.
Enhancer loops appear stable during development and are associated with paused polymerase
Ghavi-Helm Y, Klein FA, Pakozdi T, Ciglar L, Noordermeer D, Huber W, Furlong EE (2014)
Nature 512(7512):96-100. doi: 10.1038/nature13417 | Europe PMC | doi
See also phys.org/news/unexpected-stability-and-complexity-in-transcriptional-enhancers-interactions and f1000.com/article-recommendations/ghavihelm2014
Developmental enhancers initiate transcription and are fundamental to our understanding of developmental networks, evolution and disease. Despite their importance, the properties governing enhancer-promoter interactions and their dynamics during embryogenesis remain unclear. At the β-globin locus, enhancer-promoter interactions appear dynamic and cell-type specific, whereas at the HoxD locus they are stable and ubiquitous, being present in tissues where the target genes are not expressed. The extent to which preformed enhancer-promoter conformations exist at other, more typical, loci and how transcription is eventually triggered is unclear. Here we generated a high-resolution map of enhancer three-dimensional contacts during Drosophila embryogenesis, covering two developmental stages and tissue contexts, at unprecedented resolution. Although local regulatory interactions are common, long-range interactions are highly prevalent within the compact Drosophila genome. Each enhancer contacts multiple enhancers, and promoters with similar expression, suggesting a role in their co-regulation. Notably, most interactions appear unchanged between tissue context and across development, arising before gene activation, and are frequently associated with paused RNA polymerase. Our results indicate that the general topology governing enhancer contacts is conserved from flies to humans and suggest that transcription initiates from preformed enhancer-promoter loops through release of paused polymerase.
All coordinates are relative to the dm3 assembly
A companion website has been released with Ghavi-Helm
et al. 2014.
We provide the ChIP-seq signal data for download:
- ChIP-seq signal, i.e. log2 ratio, for each view point, as bigwig files (319 files, ~3Gb).
Raw data is available at ArrayExpress:
Coordinated repression and activation of two transcriptional programs stabilizes cell fate during myogenesis
Ciglar L, Girardot C, Wilczyński B, Braun M, Furlong EE (2014)
Development 141:2633-2643. doi: 10.1242/dev.101956 | Europe PMC | doi
Molecular models of
cell fate specification typically focus on the activation of specific lineage programs. However, the concurrent repression of unwanted transcriptional networks is also essential to stabilize certain cellular identities, as shown in a number of diverse systems and phyla. Here, we demonstrate that this dual requirement also holds true in the context of
Drosophila myogenesis. By integrating genetics and genomics, we identified a new role for the pleiotropic transcriptional repressor Tramtrack69 in myoblast specification.
Drosophila muscles are formed through the fusion of two discrete
cell types: founder
cells (FCs) and fusion-competent myoblasts (FCMs). When tramtrack69 is removed, FCMs appear to adopt an alternative muscle FC-like fate. Conversely, ectopic expression of this repressor phenocopies muscle
defects seen in loss-of-function lame duck mutants, a transcription factor specific to FCMs. This occurs through Tramtrack69-mediated repression in FCMs, whereas Lame duck activates a largely distinct transcriptional program in the same
cells. Lineage-specific factors are therefore not sufficient to maintain FCM identity. Instead, their identity appears more plastic, requiring the combination of instructive repressive and activating programs to stabilize
cell fate.
All coordinates are relative to the dm3 assembly
We provide the ChIP-chip signal i.e. log2(IP/Mock), for each transcription factor and developmental time used in this study. The signal is provided as bigwig files:
Raw data is available at ArrayExpress:
Subtle changes in motif positioning cause tissue-specific effects on robustness of an enhancer’s activity
Erceg J, Saunders TE, Girardot C, Devos DP, Hufnagel L, Furlong EE (2014)
PLoS Genet. 10(1):e1004060. doi: 10.1371/journal.pgen.1004060 | Europe PMC | doi
See also nature.com/research-highlights/synthetic-modeling-of-developmental-enhancers
Deciphering the specific contribution of individual motifs within cis-regulatory modules (CRMs) is crucial to understanding how
gene expression is regulated and how this process is affected by sequence variation. But despite vast improvements in the ability to identify where transcription factors (TFs) bind throughout the genome, we are limited in our ability to relate information on motif occupancy to function from sequence alone. Here, we engineered 63 synthetic CRMs to systematically assess the relationship between variation in the content and spacing of motifs within CRMs to CRM activity during development using
Drosophila transgenic embryos. In over half the cases, very simple elements containing only one or two types of TF
binding motifs were capable of driving specific spatio-temporal patterns during development. Different motif organizations provide different degrees of robustness to enhancer activity, ranging from binary on-off responses to more subtle effects including embryo-to-embryo and within-embryo variation. By quantifying the effects of subtle changes in motif organization, we were able to model biophysical rules that explain CRM
behavior and may contribute to the spatial positioning of CRM activity in vivo. For the same enhancer, the effects of small differences in motif positions varied in developmentally related tissues, suggesting that
gene expression may be more susceptible to sequence variation in one tissue compared to another. This result has important implications for
human eQTL studies in which many associated mutations are found in cis-regulatory regions, though the mechanism for how they affect tissue-specific
gene expression is often not understood.
A conserved role for Snail as a potentiator of active transcription
Rembold M*, Ciglar L*, Yáñez-Cuna JO, Zinzen RP, Girardot C, Jain A, Welte MA, Stark A, Leptin M‡, Furlong EE‡ (2014)
Genes Dev. 28:167-181. doi: 10.1101/gad.230953.113 | Europe PMC | doi
* These authors contributed equally to this work.
‡ Co-corresponding authors
The transcription factors of the Snail family are key regulators of epithelial-mesenchymal transitions, cell morphogenesis, and tumor metastasis. Since its discovery in Drosophila ∼25 years ago, Snail has been extensively studied for its role as a transcriptional repressor. Here we demonstrate that Drosophila Snail can positively modulate transcriptional activation. By combining information on in vivo occupancy with expression profiling of hand-selected, staged snail mutant embryos, we identified 106 genes that are potentially directly regulated by Snail during mesoderm development. In addition to the expected Snail-repressed genes, almost 50% of Snail targets showed an unanticipated activation. The majority of “Snail-activated” genes have enhancer elements cobound by Twist and are expressed in the mesoderm at the stages of Snail occupancy. Snail can potentiate Twist-mediated enhancer activation in vitro and is essential for enhancer activity in vivo. Using a machine learning approach, we show that differentially enriched motifs are sufficient to predict Snail’s regulatory response. In silico mutagenesis revealed a likely causative motif, which we demonstrate is essential for enhancer activation. Taken together, these data indicate that Snail can potentiate enhancer activation by collaborating with different activators, providing a new mechanism by which Snail regulates development.
All coordinates are relative to the dm3 assembly.
We provide the ChIP-chip signal i.e. log2(IP/Mock), for each transcription factor and developmental time used in this study. The signal is provided as bigwig files:
The Twist dataset is from Zinzen et al., 2009 – reprocessed with dm3.
We also provide the expression profiling result tables for both developmental time used:
Raw data is available at ArrayExpress:
Logical modelling of Drosophila signalling pathways
Mbodj A, Junion G, Brun C, Furlong EE, Thieffry D (2013)
Mol Biosyst 9(9):2248-2258. doi: 10.1039/c3mb70187e | Europe PMC | doi
A limited number of signalling pathways are involved in the specification of cell fate during the development of all animals. Several of these pathways were originally identified in Drosophila. To clarify their roles, and possible cross-talk, we have built a logical model for the nine key signalling pathways recurrently used in metazoan development. In each case, we considered the associated ligands, receptors, signal transducers, modulators, and transcription factors reported in the literature. Implemented using the logical modelling software GINsim, the resulting models qualitatively recapitulate the main characteristics of each pathway, in wild type as well as in various mutant situations (e.g. loss-of-function or gain-of-function). These models constitute pluggable modules that can be used to assemble comprehensive models of complex developmental processes. Moreover, these models of Drosophila pathways could serve as scaffolds for more complicated models of orthologous mammalian pathways. Comprehensive model annotations and GINsim files are provided for each of the nine considered pathways.
Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing
Zichner T, Garfield DA, Rausch T, Stütz AM, Cannavó E, Braun M, Furlong EE, Korbel JO (2013)
Genome Res. 23(3):568-579. doi: 10.1101/gr.142646.112 | Europe PMC | doi
Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs comprising 8962 deletions and 916 tandem duplications in 39 lines derived from short-read DNA sequencing in a natural population (the “Drosophila melanogaster Genetic Reference Panel,” DGRP). Most SVs (>90%) were inferred at nucleotide resolution, and a large fraction was genotyped across all samples. Comprehensive analyses of SV formation mechanisms using the short-read data revealed an abundance of SVs formed by mobile element and nonhomologous end-joining-mediated rearrangements, and clustering of variants into SV hotspots. We further observed a strong depletion of SVs overlapping genes, which, along with population genetics analyses, suggests that these SVs are often deleterious. We inferred several gene fusion events also highlighting the potential role of SVs in the generation of novel protein products. Expression quantitative trait locus (eQTL) mapping revealed the functional impact of our high-resolution SV map, with quantifiable effects at >100 genic loci. Our map represents a resource for population-level studies of SVs in an important model organism.
Fragmentation of DNA in a sub-microliter microfluidic sonication device
Tseng Q, Lomonosov AM, Furlong EE, Merten CA (2012)
Lab Chip 12(22):4677-4682. doi: 10.1039/c2lc40595d | Europe PMC | doi
Fragmentation of DNA is an essential step for many biological applications including the preparation of next-generation sequencing (NGS) libraries. As sequencing technologies push the limits towards single cell and single molecule resolution, it is of great interest to reduce the scale of this upstream fragmentation step. Here we describe a miniaturized DNA shearing device capable of processing sub-microliter samples based on acoustic shearing within a microfluidic chip. A strong acoustic field was generated by a Langevin-type piezo transducer and coupled into the microfluidic channel via the flexural lamb wave mode. Purified genomic DNA, as well as covalently cross-linked chromatin were sheared into various fragment sizes ranging from ∼180 bp to 4 kb. With the use of standard PDMS soft lithography, our approach should facilitate the integration of additional microfluidic modules and ultimately allow miniaturized NGS workflows.
easyRNASeq: a bioconductor package for processing RNA-Seq data
Delhomme N, Padioleau I, Furlong EE, Steinmetz LM (2012)
Bioinformatics 28(19):2532-2533. doi: 10.1093/bioinformatics/bts477 | Europe PMC | doi
MOTIVATION: RNA sequencing is becoming a standard for expression profiling experiments and many tools have been developed in the past few years to analyze RNA-Seq data. Numerous ‘Bioconductor’ packages are available for next-generation sequencing data loading in R, e.g. ShortRead and Rsamtools as well as to perform differential
gene expression analyses, e.g. DESeq and edgeR. However, the processing tasks lying in between these require the precise interplay of many Bioconductor packages, e.g. Biostrings, IRanges or external solutions are to be sought.
RESULTS: We developed ‘easyRNASeq’, an R package that simplifies the processing of RNA sequencing data, hiding the complex interplay of the required packages behind a single functionality.
AVAILABILITY: The package is implemented in R (as of version 2.15) and is available from Bioconductor (as of version 2.10) at the URL:
http://bioconductor.org/packages/release/bioc/html/easyRNASeq.html, where installation and usage instructions can be found. CONTACT:
delhomme@embl.de.
Count summarization and normalization for RNA-Seq data
Transcription factors: from enhancer binding to developmental control
Spitz F, Furlong EE (2012)
Nat. Rev. Genet. 13(9):613-626. doi: 10.1038/nrg3207 | Europe PMC | doi
Developmental progression is driven by specific spatiotemporal domains of gene expression, which give rise to stereotypically patterned embryos even in the presence of environmental and genetic variation. Views of how transcription factors regulate gene expression are changing owing to recent genome-wide studies of transcription factor binding and RNA expression. Such studies reveal patterns that, at first glance, seem to contrast with the robustness of the developmental processes they encode. Here, we review our current knowledge of transcription factor function from genomic and genetic studies and discuss how different strategies, including extensive cooperative regulation (both direct and indirect), progressive priming of regulatory elements, and the integration of activities from multiple enhancers, confer specificity and robustness to transcriptional regulation during development.
Cell type-specific chromatin immunoprecipitation from multicellular complex samples using BiTS-ChIP
Bonn S, Zinzen RP, Perez-Gonzalez A, Riddell A, Gavin AC, Furlong EE (2012)
Nat Protoc 7(5):978-994. doi: 10.1038/nprot.2012.049 | Europe PMC | doi
This protocol describes the batch isolation of tissue-specific
chromatin for immunoprecipitation (BiTS-ChIP) for analysis of
histone modifications,
transcription factor binding, or polymerase occupancy within the context of a multicellular organism or tissue. Embryos expressing a
cell type-specific nuclear marker are
formaldehyde cross-linked and then subjected to dissociation. Fixed nuclei are isolated and sorted using
FACS on the basis of the
celltype-specific nuclear marker. Tissue-specific
chromatin is extracted, sheared by sonication and used for ChIP-seq or other analyses. The key advantages of this method are the covalent cross-linking before embryo dissociation, which preserves the transcriptional context, and the use of
FACS of nuclei, yielding very high purity. The protocol has been optimized for
Drosophila, but with minor modifications should be applicable to any model system. The full protocol, including sorting, immunoprecipitation and generation of sequencing libraries, can be completed within 5 d.
Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development
Bonn S*, Zinzen RP*, Girardot C*, Gustafson EH, Perez-Gonzalez A, Delhomme N, Ghavi-Helm Y, Wilczyński B, Riddell A, Furlong EE (2012)
Nat. Genet. 44(2):148-156. doi: 10.1038/ng.1064 | Europe PMC | doi
* These authors contributed equally to this work.
Chromatin modifications are associated with many aspects of
gene expression, yet their role in cellular transitions during development remains elusive. Here, we use a new approach to obtain
cell type-specific information on
chromatin state and RNA polymerase II (
Pol II) occupancy within the multicellular
Drosophila melanogaster embryo. We directly assessed the relationship between
chromatin modifications and the spatio-temporal activity of enhancers. Rather than having a unique
chromatin state, active developmental enhancers show heterogeneous
histone modifications and
Pol II occupancy. Despite this complexity, combined
chromatinsignatures and
Pol II presence are sufficient to predict enhancer activity de novo.
Pol II recruitment is highly predictive of the timing of enhancer activity and seems dependent on the timing and location of
transcription factor binding.
Chromatin modifications typically demarcate large regulatory regions encompassing multiple enhancers, whereas local changes in
nucleosome positioning and
Pol II occupancy delineate single active enhancers. This
cell type-specific view identifies dynamic enhancer usage, an essential step in deciphering developmental networks.
All coordinates are relative to the dm3 assembly.
We provide the (BiTS-)ChIP-seq signal files. Reads from both biological replicates were shifted by 100 bp and binned into 50bp windows. The signal is provided as wig files:
- Read Per Genomic Coverage, i.e. library size normalised but no background correction
- Read Per Genomic Coverage background subtracted, i.e. library size normalised and read counts from background subtracted; background is H3 for chromatin modifications and input otherwise
Aligned data (individual BAM files) is available at ENA:
A transcription factor collective defines cardiac cell fate and reflects lineage history
Junion G*, Spivakov M*, Girardot C, Braun M, Gustafson EH, Birney E, Furlong EE (2012)
Cell 148(3):473-486. doi: 10.1016/j.cell.2012.01.030 | Europe PMC | doi
* These authors contributed equally to this work.
Cell fate decisions are driven through the integration of inductive signals and tissue-specific transcription factors (TFs), although the details on how this information converges in cis remain unclear. Here, we demonstrate that the five genetic components essential for cardiac specification in
Drosophila, including the effectors of Wg and
Dpp signaling, act as a collective unit to cooperatively regulate heart enhancer activity, both in vivo and in vitro. Their combinatorial
binding does not require any specific motif orientation or spacing, suggesting an alternative mode of enhancer function whereby cooperative activity occurs with extensive motif flexibility. A fraction of enhancers co-occupied by cardiogenic TFs had unexpected activity in the neighboring visceral mesoderm but could be rendered active in heart through single-site mutations. Given that cardiac and visceral
cells are both derived from the dorsal mesoderm, this “dormant” TF
binding signature may represent a molecular footprint of these
cells‘ developmental lineage.
All coordinates are relative to the dm3 assembly.
We provide the ChIP-chip signal, i.e. log2(IP/Mock) and smoothed with a bandwidth of 5 probes, for each transcription factor and developmental time used in this study. The signal is provided as bigwig files (14 files ~550Mb):
The Tinman and Biniou datasets are from Zinzen et al., 2009 – reprocessed with dm3.
We also provide the regions of strong ChIP enrichment for the five cardiogenic transcription factors (Suppl. Table 1) as reported by TileMap as well as Cis-Regulatory Modules (CRMs) defined by the binding of cardiogenic transcription factors (Suppl. Table 2):
Raw data is available at ArrayExpress:
Analyzing transcription factor occupancy during embryo development using ChIP-seq
Ghavi-Helm Y, Furlong EE (2012)
Methods Mol. Biol. 786:229-245. doi: 10.1007/978-1-61779-292-2_14 | Europe PMC | doi
Accurately assessing the binding of transcription factors to cis-regulatory elements in vivo is an essential step toward understanding the mechanisms that govern embryonic development. Genome-wide transcription factor location analysis has been facilitated by the development of high-density tiling arrays (ChIP-on-chip), and more recently by next-generation sequencing technologies, which are used to sequence the DNA fragments obtained from chromatin immunoprecipitation experiments (ChIP-seq). This chapter provides a detailed protocol of the different steps required to generate a successful ChIP-seq library, starting from embryo collection and fixation to chromatin preparation, immunoprecipitation, and finally library preparation. The protocol is optimized for Drosophila embryos, but can be adapted to any organism. The obtained library is suitable for sequencing on an Illumina GAIIx platform.
Analysis of variation at transcription factor binding sites in Drosophila and humans
Spivakov M, Akhtar J, Kheradpour P, Beal K, Girardot C, Koscielny G, Herrero J, Kellis M, Furlong EE, Birney E (2012)
Genome Biol. 13(9):R49. doi: 10.1186/gb-2012-13-9-r49 | Europe PMC | doi
BACKGROUND: Advances in sequencing technology have boosted population genomics and made it possible to map the positions of
transcription factor binding sites (TFBSs) with high precision. Here we investigate TFBS variability by combining
transcription factor binding maps generated by ENCODE, modENCODE, our previously published data and other sources with genomic variation data for
human individuals and
Drosophila isogenic lines.
RESULTS: We introduce a metric of TFBS variability that takes into account changes in motif match associated with mutation and makes it possible to investigate TFBS functional constraints instance-by-instance as well as in sets that share common biological properties. We also take advantage of the emerging per-individual
transcription factor binding data to show evidence that TFBS mutations, particularly at evolutionarily conserved sites, can be efficiently buffered to ensure coherent levels of
transcription factor binding.
CONCLUSIONS: Our analyses provide insights into the relationship between individual and interspecies variation and show evidence for the functional buffering of TFBS mutations in both
humans and
flies. In a broad perspective, these results demonstrate the potential of combining functional genomics and population genetics approaches for understanding gene regulation.
Predicting spatial and temporal gene expression using an integrative model of transcription factor occupancy and chromatin state
Wilczyński B, Liu YH, Yeo ZX, Furlong EE (2012)
PLoS Comput. Biol. 8(12):e1002798. doi: 10.1371/journal.pcbi.1002798 | Europe PMC | doi
See also nature.com/research-highlights/gene-expression_predictions-across-space-and-time
Precise patterns of spatial and temporal
gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene’s expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of
gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator
binding and
histone modification status in the vicinity of individual gene loci, at a genome-wide scale during
Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor
chromatin state alone can explain all
gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal predictions and 50% for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific
gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.
Molecular biology: A fly in the face of genomics
Furlong EE (2011)
Nature 471(7339):458-459. doi: 10.1038/471458a | Europe PMC | doi
The importance of being specified: cell fate decisions and their role in cell biology
Furlong EE (2010)
Mol. Biol. Cell 21(22):3797-3798. doi: 10.1091/mbc.e10-05-0436 | Europe PMC | doi
Combinatorial binding leads to diverse regulatory responses: Lmd is a tissue-specific modulator of Mef2 activity
Cunha PM*, Sandmann T*, Gustafson EH, Ciglar L, Eichenlaub MP, Furlong EE (2010)
PLoS Genet. 6(7):e1001014. doi: 10.1371/journal.pgen.1001014 | Europe PMC | doi
* These authors contributed equally to this work.
Understanding how complex patterns of temporal and spatial expression are regulated is central to deciphering genetic programs that drive development. Gene expression is initiated through the action of transcription factors and their cofactors converging on enhancer elements leading to a defined activity. Specific constellations of combinatorial occupancy are therefore often conceptualized as rigid binding codes that give rise to a common output of spatio-temporal expression. Here, we assessed this assumption using the regulatory input of two essential transcription factors within the Drosophila myogenic network. Mutations in either Myocyte enhancing factor 2 (Mef2) or the zinc-finger transcription factor lame duck (lmd) lead to very similar defects in myoblast fusion, yet the underlying molecular mechanism for this shared phenotype is not understood. Using a combination of ChIP-on-chip analysis and expression profiling of loss-of-function mutants, we obtained a global view of the regulatory input of both factors during development. The majority of Lmd-bound enhancers are co-bound by Mef2, representing a subset of Mef2’s transcriptional input during these stages of development. Systematic analyses of the regulatory contribution of both factors demonstrate diverse regulatory roles, despite their co-occupancy of shared enhancer elements. These results indicate that Lmd is a tissue-specific modulator of Mef2 activity, acting as both a transcriptional activator and repressor, which has important implications for myogenesis. More generally, this study demonstrates considerable flexibility in the regulatory output of two factors, leading to additive, cooperative, and repressive modes of co-regulation.
All coordinates are relative to the dm3 assembly.
We provide the Lameduck (Lmd) ChIP-on-chip enriched fragments and the expression profiling result table of lmd loss-of-function and wildtype embryos:
Dynamic CRM occupancy reflects a temporal map of developmental progression
Wilczyński B, Furlong EE (2010)
Mol. Syst. Biol. 6:383. doi: 10.1038/msb.2010.35 | Europe PMC | doi
Development is driven by tightly coordinated spatio-temporal patterns of
gene expression, which are initiated through the action of transcription factors (TFs)
binding to cis-regulatory modules (CRMs). Although many studies have investigated how spatial patterns arise, precise temporal control of
gene expression is less well understood. Here, we show that dynamic changes in the timing of CRM occupancy is a prevalent feature common to all TFs examined in a developmental ChIP time course to date. CRMs exhibit complex
bindingpatterns that cannot be explained by the sequence motifs or expression of the TFs themselves. The temporal changes in TF
binding are highly correlated with dynamic patterns of target
gene expression, which in turn reflect transitions in cellular function during different stages of development. Thus, it is not only the timing of a TF’s expression, but also its temporal occupancy in refined time windows, which determines temporal
gene expression. Systematic measurement of dynamic CRM occupancy may therefore serve as a powerful method to decode dynamic changes in
gene expression driving developmental progression.
All coordinates are relative to the dm2 assembly.
We provide the ChIP data for CRMs bound by one or more transcription factor at at least two consecutive time-points:
Model-based method for transcription factor target identification with limited data
Honkela A, Girardot C, Gustafson EH, Liu YH, Furlong EE, Lawrence ND, Rattray M (2010)
Proc. Natl. Acad. Sci. U.S.A. 107(17):7793-7798. doi: 10.1073/pnas.0914285107 | Europe PMC | doi
We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.
Challenges for modeling global gene regulatory networks during development: insights from Drosophila
Wilczyński B, Furlong EE (2010)
Dev. Biol. 340(2):161-169. doi: 10.1016/j.ydbio.2009.10.032 | Europe PMC | doi
Development is regulated by dynamic patterns of gene expression, which are orchestrated through the action of complex gene regulatory networks (GRNs). Substantial progress has been made in modeling transcriptional regulation in recent years, including qualitative “coarse-grain” models operating at the gene level to very “fine-grain” quantitative models operating at the biophysical “transcription factor-DNA level”. Recent advances in genome-wide studies have revealed an enormous increase in the size and complexity or GRNs. Even relatively simple developmental processes can involve hundreds of regulatory molecules, with extensive interconnectivity and cooperative regulation. This leads to an explosion in the number of regulatory functions, effectively impeding Boolean-based qualitative modeling approaches. At the same time, the lack of information on the biophysical properties for the majority of transcription factors within a global network restricts quantitative approaches. In this review, we explore the current challenges in moving from modeling medium scale well-characterized networks to more poorly characterized global networks. We suggest to integrate coarse- and find-grain approaches to model gene regulatory networks in cis. We focus on two very well-studied examples from Drosophila, which likely represent typical developmental regulatory modules across metazoans.
Conservation and divergence in developmental networks: a view from Drosophila myogenesis
Ciglar L, Furlong EE (2009)
Curr. Opin. Cell Biol. 21(6):754-760. doi: 10.1016/j.ceb.2009.10.001 | Europe PMC | doi
Understanding developmental networks has recently been enhanced through the identification of a large number of conserved essential regulators. Interspecies comparisons of the transcriptional networks regulated by these factors are still at a rather early stage, with limited global data available. Here we use the accumulating phenotypic information from multiple species to provide initial insights into the wiring and rewiring of developmental networks, with particular emphasis on myogenesis, a highly conserved developmental process. This review highlights the most recent findings on the transcriptional program driving Drosophila myogenesis and compares this with vertebrates, revealing emerging themes that may be applicable to other developmental contexts.
Combinatorial binding predicts spatio-temporal cis-regulatory activity
Zinzen RP*, Girardot C*, Gagneur J*, Braun M, Furlong EE (2009)
Nature 462(7269):65-70. doi: 10.1038/nature08531 | Europe PMC | doi
* These authors contributed equally to this work.
See also msb.embopress.org/news-and-views/learning-the-transcriptional-regulatory-code, nature.com/news-and-views/chips-and-regulatory-bits and onlinelibrary.wiley.com/deciphering-the-genome-regulatory-code_the-many-languages-of-dna
Development requires the establishment of precise patterns of gene expression, which are primarily controlled by transcription factors binding to cis-regulatory modules. Although transcription factor occupancy can now be identified at genome-wide scales, decoding this regulatory landscape remains a daunting challenge. Here we used a novel approach to predict spatio-temporal cis-regulatory activity based only on in vivo transcription factor binding and enhancer activity data. We generated a high-resolution atlas of cis-regulatory modules describing their temporal and combinatorial occupancy during Drosophila mesoderm development. The binding profiles of cis-regulatory modules with characterized expression were used to train support vector machines to predict five spatio-temporal expression patterns. In vivo transgenic reporter assays demonstrate the high accuracy of these predictions and reveal an unanticipated plasticity in transcription factor binding leading to similar expression. This data-driven approach does not require previous knowledge of transcription factor sequence affinity, function or expression, making it widely applicable.
All coordinates are relative to the dm2 assembly.
We provide the ChIP data produced for Zinzen
et al. 2009:
A systematic analysis of Tinman function reveals Eya and JAK-STAT signaling as essential regulators of muscle development
Liu YH, Jakobsen JS, Valentin G, Amarantos I, Gilmour DT, Furlong EE (2009)
Dev. Cell 16(2):280-291. doi: 10.1016/j.devcel.2009.01.006 | Europe PMC | doi
Nk-2 proteins are essential developmental regulators from flies to humans. In Drosophila, the family member tinman is the major regulator of cell fate within the dorsal mesoderm, including heart, visceral, and dorsal somatic muscle. To decipher Tinman’s direct regulatory role, we performed a time course of ChIP-on-chip experiments, revealing a more prominent role in somatic muscle specification than previously anticipated. Through the combination of transgenic enhancer-reporter assays, colocalization studies, and phenotypic analyses, we uncovered two additional factors within this myogenic network: by activating eyes absent, Tinman’s regulatory network extends beyond developmental stages and tissues where it is expressed; by regulating stat92E expression, Tinman modulates the transcriptional readout of JAK/STAT signaling. We show that this pathway is essential for somatic muscle development in
Drosophila and for myotome morphogenesis in zebrafish. Taken together, these data uncover a conserved requirement for JAK/STAT signaling and an important component of the transcriptional network driving myogenesis.de]
All coordinates are relative to the dm2 assembly.
We provide the ChIP data for regions bound by the Tinman transcription factor:
cis-Regulatory networks during development: a view of Drosophila
Bonn S, Furlong EE (2008)
Curr. Opin. Genet. Dev. 18(6):513-520. doi: 10.1016/j.gde.2008.09.005 | Europe PMC | doi
Understanding how regulatory networks initiate, maintain and synchronise transcriptional states remains a fundamental goal of developmental biology. Complex patterns of spatio-temporal gene expression are generated through the combined inputs of signalling and transcriptional networks converging on cis-regulatory modules (CRMs). Detailed studies in Drosophila, using transgenic reporter assays and mutagenesis analysis, have dissected the regulatory logic of a number of CRMs. These data have recently been complemented by genome-wide maps of transcription factor binding, revealing an unprecedented view of CRM occupancy and network complexity. The synthesis of data for three well-characterised Drosophila developmental networks reveals emerging themes at both a CRM and a cis-regulatory network level.
Dynamic regulation by polycomb group protein complexes controls pattern formation and the cell cycle in Drosophila
Oktaba K, Gutiérrez L, Gagneur J, Girardot C, Sengupta AK, Furlong EE, Müller J (2008)
Dev. Cell 15(6):877-889. doi: 10.1016/j.devcel.2008.10.005 | Europe PMC | doi
Polycomb group (PcG) proteins form conserved regulatory complexes that modify chromatin to repress transcription. Here, we report genome-wide binding profiles of PhoRC, the Drosophila PcG protein complex containing the DNA-binding factor Pho/dYY1 and dSfmbt. PhoRC constitutively occupies short Polycomb response elements (PREs) of a large set of developmental regulator genes in both embryos and larvae. The majority of these PREs are co-occupied by the PcG complexes PRC1 and PRC2. Analysis of PcG mutants shows that the PcG system represses genes required for anteroposterior, dorsoventral, and proximodistal patterning of imaginal discs and that it also represses cell cycle regulator genes. Many of these genes are regulated in a dynamic manner, and our results suggest that the PcG system restricts signaling-mediated activation of target genes to appropriate cells. Analysis of cell cycle regulators indicates that the PcG system also dynamically modulates the expression levels of certain genes, providing a possible explanation for the tumor phenotype of PcG mutants.
A topographical map of spatiotemporal patterns of gene expression
Furlong EE (2008)
Dev. Cell 14(5):639-640. doi: 10.1016/j.devcel.2008.04.007 | Europe PMC | doi
A recent study by Folkes et al. in Cell generated a 3D atlas of gene expression for the Drosophila blastoderm embryo using a new approach for image registration. This virtual embryo allows in silico multiplexing of in situ hybridizations and lays the groundwork for new insights into gene regulatory networks.
4DXpress: a database for cross-species expression pattern comparisons
Haudry Y, Berube H, Letunic I, Weeber PD, Gagneur J, Girardot C, Kapushesky M, Arendt D, Bork P, Brazma A, Furlong EE, Wittbrodt J, Henrich T (2008)
Nucleic Acids Res. 36(Database issue):D847-53. doi: 10.1093/nar/gkm797 | Europe PMC | doi
In the major animal model species like
mouse,
fish or fly, detailed spatial information on
gene expression over time can be acquired through whole mount in situ hybridization experiments. In these species, expression patterns of many genes have been studied and data has been integrated into dedicated model organism databases like ZFIN for
zebrafish, MEPD for
medaka, BDGP for
Drosophila or GXD for
mouse. However, a central repository that allows users to query and compare
gene expression patterns across different species has not yet been established. Therefore, we have integrated expression patterns for
zebrafish,
Drosophila,
medaka and
mouse into a central public repository called 4DXpress (expression database in four dimensions). Users can query anatomy ontology-based expression annotations across species and quickly jump from one gene to the orthologues in other species. Genes are linked to public microarray data in ArrayExpress. We have mapped developmental stages between the species to be able to compare developmental time phases. We store the largest collection of
gene expression patterns available to date in an individual resource, reflecting 16 505 annotated genes. 4DXpress will be an invaluable tool for developmental as well as for computational biologists interested in gene regulation and evolution. 4DXpress is available at http://
ani.embl.de/4DXpress.
Enhanced function annotations for Drosophila serine proteases: a case study for systematic annotation of multi-member gene families
Shah PK, Tripathi LP, Jensen LJ, Gahnim M, Mason C, Furlong EE, Rodrigues V, White KP, Bork P, Sowdhamini R (2008)
Gene 407(1-2):199-215. doi: 10.1016/j.gene.2007.10.012 | Europe PMC | doi
Systematically annotating function of enzymes that belong to large
protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the
trypsin fold in
Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (
SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190
Drosophila gene products containing serine-protease-like domains, of which 35 are
SPHs. This was accomplished by analysis of structure-function relationships,
gene-expression profiles, large-scale
protein–
protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on
Drosophilaproteases to assign putative competitive inhibitor relationships (
CIRs). Microarray
gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in
immune response. We found specific recruitment of
SPHs and proteases with CLIP domains in
immune response, suggesting evolution of a new function for
SPHs. We also suggest existence of separate downstream protease cascades for
immune response against bacterial/
fungal infections and parasite/parasitoid
infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for
Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes.
Divergence in cis-regulatory networks: taking the ‘species’ out of cross-species analysis
Zinzen RP, Furlong EE (2008)
Genome Biol. 9(11):240. doi: 10.1186/gb-2008-9-11-240 | Europe PMC | doi
Many essential transcription factors have conserved roles in regulating biological programs, yet their genomic occupancy can diverge significantly. A new study demonstrates that such variations are primarily due to cis-regulatory sequences, rather than differences between the regulators or nuclear environments.
Temporal ChIP-on-chip reveals Biniou as a universal regulator of the visceral muscle transcriptional network
Jakobsen JS, Braun M, Astorga J, Gustafson EH, Sandmann T, Karzynski M, Carlsson P, Furlong EE (2007)
Genes Dev. 21(19):2448-2460. doi: 10.1101/gad.437607 | Europe PMC | doi
Smooth muscle plays a prominent role in many fundamental processes and diseases, yet our understanding of the transcriptional network regulating its development is very limited. The
FoxF transcription factors are essential for visceral smooth muscle development in diverse species, although their direct regulatory role remains elusive. We present a transcriptional map of Biniou (a
FoxF transcription factor) and Bagpipe (an Nkx factor) activity, as a first step to deciphering the developmental program regulating
Drosophila visceral muscle development. A time course of
chromatin immunoprecipitatation followed by microarray analysis (ChIP-on-chip) experiments and expression profiling of mutant embryos reveal a dynamic map of in vivo bound enhancers and direct target genes. While Biniou is broadly expressed, it regulates enhancers driving temporally and spatially restricted expression. In vivo reporter assays indicate that the timing of Biniou
binding is a key trigger for the time span of enhancer activity. Although bagpipe and biniou mutants phenocopy each other, their regulatory potential is quite different. This network architecture was not apparent from genetic studies, and highlights Biniou as a universal regulator in all visceral muscle, regardless of its developmental origin or subsequent function. The regulatory connection of a number of Biniou target genes is conserved in
mice, suggesting an ancient wiring of this developmental program.
All coordinates are relative to the dm2 assembly.
We provide the ChIP-on-chip data for Bagpipe and Biniou and the expression profiling result table for Biniou:
CoCo: a web application to display, store and curate ChIP-on-chip data integrated with diverse types of gene expression data
Girardot C, Sklyar O, Grosz S, Huber W, Furlong EE (2007)
Bioinformatics 23(6):771-773. doi: 10.1093/bioinformatics/btl641 | Europe PMC | doi
MOTIVATION: CoCo, ChIP-on-Chip online, is an open-source web application that supports the annotation and curation of regulatory regions and associated target genes discovered in ChIP-on-chip experiments. CoCo integrates ChIP-on-chip results with diverse types of gene expression data (expression profiling, in situ hybridization) and displays them within a genomic context. Regulatory relationships between the transcription factor-bound regions and putative target genes can be stored and expanded throughout different sessions. AVAILABILITY:
http://furlonglab.embl.de/methods/tools/coco.
A core transcriptional network for early mesoderm development in Drosophila melanogaster
Sandmann T, Girardot C, Brehme M, Tongprasit W, Stolc V, Furlong EE (2007)
Genes Dev. 21(4):436-449. doi: 10.1101/gad.1509007 | Europe PMC | doi
See also nature.com/research-highlight/chipping-away-at-developmental-networks
Embryogenesis is controlled by large gene-regulatory networks, which generate spatially and temporally refined patterns of gene expression. Here, we report the characteristics of the regulatory network orchestrating early mesodermal development in the fruitfly Drosophila, where the transcription factor Twist is both necessary and sufficient to drive development. Through the integration of chromatin immunoprecipitation followed by microarray analysis (ChIP-on-chip) experiments during discrete time periods with computational approaches, we identified >2000 Twist-bound cis-regulatory modules (CRMs) and almost 500 direct target genes. Unexpectedly, Twist regulates an almost complete cassette of genes required for cell proliferation in addition to genes essential for morophogenesis and cell migration. Twist targets almost 25% of all annotated Drosophila transcription factors, which may represent the entire set of regulators necessary for the early development of this system. By combining in vivo binding data from Twist, Mef2, Tinman, and Dorsal we have constructed an initial transcriptional network of early mesoderm development. The network topology reveals extensive combinatorial binding, feed-forward regulation, and complex logical outputs as prevalent features. In addition to binary activation and repression, we suggest that Twist binds to almost all mesodermal CRMs to provide the competence to integrate inputs from more specialized transcription factors.
All coordinates are relative to the dm2 assembly.
We provide the ChIP-on-chip data for Twist and the expression profiling result tables for Twist and Toll10B:
- ChIP-on-chip data:
- Expression profiling result tables:
We also provide the ChIP-on-chip signal as UCSC wig tracks:
Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis
Hooper SD, Boué S, Krause R, Jensen LJ, Mason CE, Ghanim M, White KP, Furlong EE, Bork P (2007)
Mol. Syst. Biol. 3:72. doi: 10.1038/msb4100112 | Europe PMC | doi
Time-series analysis of whole-genome expression data during Drosophila melanogaster development indicates that up to 86% of its genes change their relative transcript level during embryogenesis. By applying conservative filtering criteria and requiring ‘sharp’ transcript changes, we identified 1534 maternal genes, 792 transient zygotic genes, and 1053 genes whose transcript levels increase during embryogenesis. Each of these three categories is dominated by groups of genes where all transcript levels increase and/or decrease at similar times, suggesting a common mode of regulation. For example, 34% of the transiently expressed genes fall into three groups, with increased transcript levels between 2.5-12, 11-20, and 15-20 h of development, respectively. We highlight common and distinctive functional features of these expression groups and identify a coupling between downregulation of transcript levels and targeted protein degradation. By mapping the groups to the protein network, we also predict and experimentally confirm new functional associations.
Mes2, a MADF-containing transcription factor essential for Drosophila development
Zimmermann G, Furlong EE, Suyama K, Scott MP (2006)
Dev. Dyn. 235(12):3387-3395. doi: 10.1002/dvdy.20970 | Europe PMC | doi
The development of the
Drosophila mesoderm is initiated by the basic helix-loop-helix transcription factor twist. We identified a gene encoding a putative transcription factor,
mes2, in a screen for essential mesoderm-expressed genes that function downstream of twist.
Mes2 protein belongs to a family of 48
Drosophila proteinscontaining MADF domains. MADF domains exist in worms,
flies, and
fish.
Mes2 is a nuclear
protein first produced in trunk and head mesoderm during late
gastrulation. At later embryonic stages,
mes2 is expressed in glia of the central and peripheral nervous systems, and in tissues derived from the head mesoderm. We have identified a null mutation of
mes2 that leads to developmental arrest in first instar larvae. Increased production of
Mes2 in multiple embryonic and larval tissues almost always causes lethality. The ubiquitous or epidermal misexpression of
mes2 in the embryo causes a dramatic loss of epidermal integrity resulting in the failure of
dorsal closure. Our data show that the precise regulation of
mes2 expression is critical for normal development in
Drosophila and implicate
Mes2 in the regulation of essential target genes.
Genomics and development: Taking developmental biology to new heights
Spitz F, Furlong EE (2006)
Dev. Cell 11(4):451-457. doi: 10.1016/j.devcel.2006.09.013 | Europe PMC | doi
The 2006 Arolla meeting brought together scientists from around the globe to discuss how genomic scale analyses can enhance progress in understanding developmental biology.
A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development
Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM, Eichenlaub MP, Bork P, Furlong EE (2006)
Dev. Cell 10(6):797-807. doi: 10.1016/j.devcel.2006.04.009 | Europe PMC | doi
Dissecting components of key transcriptional networks is essential for understanding complex developmental processes and phenotypes. Genetic studies have highlighted the role of members of the
Mef2 family of transcription factors as essential regulators in myogenesis from
flies to
man. To understand how these transcription factors control diverse processes in muscle development, we have combined
chromatin immunoprecipitation analysis with
gene expression profiling to obtain a temporal map of
Mef2 activity during
Drosophila embryonic development. This global approach revealed three temporal patterns of
Mef2 enhancer binding, providing a glimpse of dynamic enhancer use within the context of a developing embryo. Our results provide mechanistic insight into the regulation of
Mef2‘s activity at the level of DNA
binding and suggest cooperativity with the bHLH
protein Twist. The number and diversity of new direct target genes indicates a much broader role for
Mef2, at all stages of myogenesis, than previously anticipated.
All coordinates are relative to the dm2 assembly.
We provide the ChIP-on-chip and the expression data for Mef2:
Developmental control of nuclear size and shape by Kugelkern and Kurzkern
Brandt A, Papagiannouli F, Wagner N, Wilsch-Bräuninger M, Braun M, Furlong EE, Loserth S, Wenzl C, Pilot F, Vogt N, Lecuit T, Krohne G, Grosshans J (2006)
Curr. Biol. 16(6):543-552. doi: 10.1016/j.cub.2006.01.051 | Europe PMC | doi
BACKGROUND: The shape of a nucleus depends on the nuclear lamina, which is tightly associated with the inner nuclear membrane and on the interaction with the cytoskeleton. However, the mechanism connecting the differentiation state of a cell to the shape changes of its nucleus are not well understood. We investigated this question in early Drosophila embryos, where the nuclear shape changes from spherical to ellipsoidal together with a 2.5-fold increase in nuclear length during cellularization.
RESULTS: We identified two genes, kugelkern and kurzkern, required for nuclear elongation. In kugelkern- and kurzkern-depleted embryos, the nuclei reach only half the length of the wild-type nuclei at the end of cellularization. The reduced nuclear size affects chromocenter formation as marked by Heterochromatin protein 1 and expression of a specific set of genes, including early zygotic genes. kugelkern contains a putative coiled-coil domain in the N-terminal half of the protein, a nuclear localization signal (NLS), and a C-terminal CxxM-motif. The carboxyterminal CxxM motif is required for the targeting of Kugelkern to the inner nuclear membrane, where it colocalizes with lamins. Depending on the farnesylation motif, expression of kugelkern in Drosophila embryos or Xenopus cells induces overproliferation of nuclear membrane.
CONCLUSIONS: Kugelkern is so far the first nuclear protein, except for lamins, that contains a farnesylation site. Our findings suggest that Kugelkern is a rate-determining factor for nuclear size increase. We propose that association of farnesylated Kugelkern with the inner nuclear membrane induces expansion of nuclear surface area, allowing nuclear growth.
ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos
Sandmann T, Jakobsen JS, Furlong EE (2006)
Nat Protoc 1(6):2839-2855. doi: 10.1038/nprot.2006.383 | Europe PMC | doi
This protocol describes a method to detect in vivo associations between proteins and DNA in developing Drosophila embryos. It combines formaldehyde crosslinking and immunoprecipitation of protein-bound sequences with genome-wide analysis using microarrays. After crosslinking, nuclei are enriched using differential centrifugation and the chromatin is sheared by sonication. Antibodies specifically recognizing wild-type protein or, alternatively, a genetically encoded epitope tag are used to enrich for specifically bound DNA sequences. After purification and polymerase chain reaction-based amplification, the samples are fluorescently labeled and hybridized to genomic tiling microarrays. This protocol has been successfully used to study different tissue-specific transcription factors, and is generally applicable to in vivo analysis of any DNA-binding proteins in Drosophila embryos. The full protocol, including the collection of embryos and the collection of raw microarray data, can be completed within 10 days.
A functional genomics approach to identify new regulators of Wnt signaling
Furlong EE (2005)
Dev. Cell 8(5):624-626. doi: 10.1016/j.devcel.2005.04.006 | Europe PMC | doi
A recent study by used a genome-wide RNAi screen in
Drosophila cells to identify 238 candidate regulators of the
Wnt-signaling pathway, most of which had not been previously connected to Wnt
signaling. Supporting in vivo studies are in progress. The fact that such an impressive number of potential modulators had eluded detection in genetic screens underscores the potential of applying new, high-throughput approaches to old problems.
Myofilin, a protein in the thick filaments of insect muscle
Qiu F, Brendel S, Cunha PM, Astola N, Song B, Furlong EE, Leonard KR, Bullard B (2005)
J. Cell. Sci. 118(Pt 7):1527-1536. doi: 10.1242/jcs.02281 | Europe PMC | doi
Thick filaments in striated muscle are myosin polymers with a length and diameter that depend on the fibre type. In invertebrates, the length of the thick filaments varies widely in different muscles and additional
proteinscontrol filament assembly. Thick filaments in asynchronous insect flight muscle have an extremely regular structure, which is likely to be essential for the oscillatory contraction of these muscles. The factors controlling the assembly of thick filaments in insect flight muscle are not known. We previously identified a thick filament
core protein, zeelin 1, in
Lethocerus flight and non-flight muscles. This has been sequenced, and the corresponding
proteins in
Drosophila and
Anopheles have been identified. The
protein has been re-named myofilin. Zeelin 2, which is on the outside of
Lethocerus flight muscle thick filaments, has been sequenced and because of the similarity to
Drosophila flightin, is re-named
flightin. In
Drosophila flight muscle, myofilin has a molecular weight of 20 kDa and is one of five isoforms produced from a single gene. In situ hybridisation of
Drosophila embryos showed that myofilin RNA is first expressed late in
embryogenesis at stage 15, a little later than myosin.
Antibody to myofilin labelled the entire A-band, except for the H-zone, in cryosections of flight and non-flight muscle. The periodicity of myofilin in
Drosophila flight muscle thick filaments was found to be 30 nm by measuring the spacing of gold particles in labelled cryosections; this is about twice the 14.5 nm spacing of myosin molecules. The molar ratio of myofilin to myosin in indirect flight muscle is 1:2, which is the same as that of
flightin. We propose a model for the association of these
proteins in thick filaments, which is consistent with the periodicity and stoichiometry. Myofilin is probably needed for filament assembly in all muscles, and
flightinfor stability of flight muscle thick filaments in adult
flies.
Integrating transcriptional and signalling networks during muscle development
Furlong EE (2004)
Curr. Opin. Genet. Dev. 14(4):343-350. doi: 10.1016/j.gde.2004.06.011 | Europe PMC | doi
A fundamental aspect of developmental decisions is the ability of groups of
cells to obtain the competence to respond to different
signalling inputs. This information is often integrated with intrinsic transcriptional networks to produce diverse developmental outcomes. Studies in
Drosophila are starting to reveal a detailed picture of the regulatory circuits controlling the subdivision of the dorsal mesoderm, which gives rise to diverse muscle types including cardioblasts, pericardial
cells, body wall muscle and gut muscle. The combination of a common set of mesoderm autonomous transcription factors (e.g. Tinman and Twist) and spatially restricted inductive signals (e.g.
Dpp and Wg) subdivide the dorsal mesoderm into different competence domains. The integration of additional
signalling inputs with localised repression within these competence domains results in diverse transcriptional responses within neighbouring
cells, which in turn generates muscle diversity.
Creation of a minimal tiling path of genomic clones for Drosophila: provision of a common resource
Hollich V, Johnson E, Furlong EE, Beckmann B, Carlson J, Celniker SE, Hoheisel JD (2004)
BioTechniques 37(2):282-284. doi: 10.2144/3702A0282 | Europe PMC | doi
On the basis of shotgun subclone libraries used in the sequencing of the Drosophila melanogaster genome, a minimal tiling path of subclones across much of the genome was determined. About 320,000 shotgun clones for chromosomes X(12-20), 2R, 2L, 3R, and 4 were available from the Berkeley Drosophila Genome Project. The clone inserts have an average length of 3.4 kb and are amenable to standard PCR amplification. The resulting tiling path covers 86.2% of chromosome X(12-20), 86.2% of chromosomal arm 2R, 79.0% of 2L, 89.6% of 3R, and 80.5% of chromosome 4. In total, the 25,135 clones represent 76.7 Mb–equivalent to about 67% of the genome–and would be suitable for producing a microarray on a single slide.
Notch and Ras signaling pathway effector genes expressed in fusion competent and founder cells during Drosophila myogenesis
Artero R, Furlong EE, Beckett K, Scott MP, Baylies M (2003)
Development 130(25):6257-6272. doi: 10.1242/dev.00843 | Europe PMC | doi</a
Drosophila muscles originate from the fusion of two types of myoblasts, founder cells (FCs) and fusion-competent myoblasts (FCMs). To better understand muscle diversity and morphogenesis, we performed a large-scale gene expression analysis to identify genes differentially expressed in FCs and FCMs. We employed embryos derived from Toll10b mutants to obtain primarily muscle-forming mesoderm, and expressed activated forms of Ras or Notch to induce FC or FCM fate, respectively. The transcripts present in embryos of each genotype were compared by hybridization to cDNA microarrays. Among the 83 genes differentially expressed, we found genes known to be enriched in FCs or FCMs, such as heartless or hibris, previously characterized genes with unknown roles in muscle development, and predicted genes of unknown function. Our studies of newly identified genes revealed new patterns of gene expression restricted to one of the two types of myoblasts, and also striking muscle phenotypes. Whereas genes such as phyllopod play a crucial role during specification of particular muscles, others such as tartan are necessary for normal muscle morphogenesis.
Gene expression during the life cycle of Drosophila melanogaster
Arbeitman MN, Furlong EE, Imam F, Johnson E, Null BH, Baker BS, Krasnow MA, Scott MP, Davis RW, White KP (2002)
Science 297(5590):2270-2275. doi: 10.1126/science.1072152 | Europe PMC | doi
Molecular genetic studies of
Drosophila melanogaster have led to profound advances in understanding the regulation of development. Here we report
gene expression patterns for nearly one-third of all
Drosophila genes during a complete time course of development. Mutations that eliminate eye or germline tissue were used to further analyze tissue-specific
gene expression programs. These studies define major characteristics of the transcriptional programs that underlie the life cycle, compare development in males and females, and show that large-scale
gene expression data collected from whole
animals can be used to identify genes expressed in particular tissues and organs or genes involved in specific biological and biochemical processes.
Patterns of gene expression during Drosophila mesoderm development
Furlong EE, Andersen EC, Null B, White KP, Scott MP (2001)
Science 293(5535):1629-1633. doi: 10.1126/science.1062660 | Europe PMC | doi
The transcription factor Twist initiates
Drosophila mesoderm development, resulting in the
formation of heart, somatic muscle, and other
cell types. Using a
Drosophila embryo sorter, we isolated enough homozygous twist mutant embryos to perform DNA microarray experiments. Transcription profiles of twist loss-of-function embryos, embryos with ubiquitous twist expression, and wild-type embryos were compared at different developmental stages. The results implicate hundreds of genes, many with vertebrate homologs, in stage-specific processes in
mesoderm development. One such gene, gleeful, related to the vertebrate
Gli genes, is essential for
somatic muscle development and sufficient to cause neural
cells to express a muscle marker.
Automated sorting of live transgenic embryos
Furlong EE, Profitt D, Scott MP (2001)
Nat. Biotechnol. 19(2):153-156. doi: 10.1038/84422 | Europe PMC | doi
The vast selection of
Drosophila mutants is an extraordinary resource for exploring molecular events underlying development and disease. We have designed and constructed an instrument that automatically separates
Drosophila embryos of one genotype from a larger population of embryos, based on a fluorescent
proteinmarker. This instrument can also sort embryos from other species, such as
Caenorhabditis elegans. The machine sorts 15 living
Drosophila embryos per second with more than 99% accuracy. Sorting living embryos will solve longstanding problems, including (1) the need for large quantities of RNA from homozygous mutant embryos to use in DNA microarray or gene-chip experiments, (2) the need for large amounts of
protein extract from homozygous mutant embryos for biochemical studies, for example to determine whether a multiprotein complex forms or localizes correctly in vivo when one component is missing, and (3) the need for rapid genetic screening for
gene expression changes in living embryos using a fluorescent
protein reporter.