Model-based method for transcription factor target identification with limited data
1. Antti Honkelaa,1,
2. Charles Girardotb,
3. E. Hilary Gustafsonb,
4. Ya-Hsin Liub,
5. Eileen E. M. Furlongb,
6. Neil D. Lawrencec,1, and
7. Magnus Rattrayc,1
+ Author Affiliations
1.
aDepartment of Information and Computer Science, Aalto University School of Science and Technology, Helsinki, Finland;
2.
bGenome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany; and
3.
cSchool of Computer Science, University of Manchester, Manchester, United Kingdom
1.
Edited by David Baker, University of Washington, Seattle, WA, and approved March 3, 2010 (received for review December 10, 2009)
Abstract
We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.
Transcription regulation, Epigenetic and Next-generation sequencing related research and advances.
Tuesday, April 27, 2010
Saturday, April 24, 2010
CpG islands influence chromatin structure via the CpG-binding protein Cfp1
Letter
Nature 464, 1082-1086 (15 April 2010) | doi:10.1038/nature08924; Received 2 October 2009; Accepted 15 February 2010
CpG islands influence chromatin structure via the CpG-binding protein Cfp1
John P. Thomson1,3, Peter J. Skene1,3, Jim Selfridge1, Thomas Clouaire1, Jacky Guy1, Shaun Webb1, Alastair R. W. Kerr1, Aimée Deaton1, Rob Andrews2, Keith D. James2, Daniel J. Turner2, Robert Illingworth1 & Adrian Bird1
1. Wellcome Trust Centre for Cell Biology, Michael Swann Building, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, UK
2. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
3. These authors contributed equally to this work.
Correspondence to: Adrian Bird1 Correspondence and requests for materials should be addressed to A.B. (Email: a.bird@ed.ac.uk).
Top of page
Abstract
CpG islands (CGIs) are prominent in the mammalian genome owing to their GC-rich base composition and high density of CpG dinucleotides1, 2. Most human gene promoters are embedded within CGIs that lack DNA methylation and coincide with sites of histone H3 lysine 4 trimethylation (H3K4me3), irrespective of transcriptional activity3, 4. In spite of these intriguing correlations, the functional significance of non-methylated CGI sequences with respect to chromatin structure and transcription is unknown. By performing a search for proteins that are common to all CGIs, here we show high enrichment for Cfp1, which selectively binds to non-methylated CpGs in vitro5, 6. Chromatin immunoprecipitation of a mono-allelically methylated CGI confirmed that Cfp1 specifically associates with non-methylated CpG sites in vivo. High throughput sequencing of Cfp1-bound chromatin identified a notable concordance with non-methylated CGIs and sites of H3K4me3 in the mouse brain. Levels of H3K4me3 at CGIs were markedly reduced in Cfp1-depleted cells, consistent with the finding that Cfp1 associates with the H3K4 methyltransferase Setd1 (refs 7, 8). To test whether non-methylated CpG-dense sequences are sufficient to establish domains of H3K4me3, we analysed artificial CpG clusters that were integrated into the mouse genome. Despite the absence of promoters, the insertions recruited Cfp1 and created new peaks of H3K4me3. The data indicate that a primary function of non-methylated CGIs is to genetically influence the local chromatin modification state by interaction with Cfp1 and perhaps other CpG-binding proteins.
Nature 464, 1082-1086 (15 April 2010) | doi:10.1038/nature08924; Received 2 October 2009; Accepted 15 February 2010
CpG islands influence chromatin structure via the CpG-binding protein Cfp1
John P. Thomson1,3, Peter J. Skene1,3, Jim Selfridge1, Thomas Clouaire1, Jacky Guy1, Shaun Webb1, Alastair R. W. Kerr1, Aimée Deaton1, Rob Andrews2, Keith D. James2, Daniel J. Turner2, Robert Illingworth1 & Adrian Bird1
1. Wellcome Trust Centre for Cell Biology, Michael Swann Building, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, UK
2. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
3. These authors contributed equally to this work.
Correspondence to: Adrian Bird1 Correspondence and requests for materials should be addressed to A.B. (Email: a.bird@ed.ac.uk).
Top of page
Abstract
CpG islands (CGIs) are prominent in the mammalian genome owing to their GC-rich base composition and high density of CpG dinucleotides1, 2. Most human gene promoters are embedded within CGIs that lack DNA methylation and coincide with sites of histone H3 lysine 4 trimethylation (H3K4me3), irrespective of transcriptional activity3, 4. In spite of these intriguing correlations, the functional significance of non-methylated CGI sequences with respect to chromatin structure and transcription is unknown. By performing a search for proteins that are common to all CGIs, here we show high enrichment for Cfp1, which selectively binds to non-methylated CpGs in vitro5, 6. Chromatin immunoprecipitation of a mono-allelically methylated CGI confirmed that Cfp1 specifically associates with non-methylated CpG sites in vivo. High throughput sequencing of Cfp1-bound chromatin identified a notable concordance with non-methylated CGIs and sites of H3K4me3 in the mouse brain. Levels of H3K4me3 at CGIs were markedly reduced in Cfp1-depleted cells, consistent with the finding that Cfp1 associates with the H3K4 methyltransferase Setd1 (refs 7, 8). To test whether non-methylated CpG-dense sequences are sufficient to establish domains of H3K4me3, we analysed artificial CpG clusters that were integrated into the mouse genome. Despite the absence of promoters, the insertions recruited Cfp1 and created new peaks of H3K4me3. The data indicate that a primary function of non-methylated CGIs is to genetically influence the local chromatin modification state by interaction with Cfp1 and perhaps other CpG-binding proteins.
Tuesday, April 20, 2010
Epigenetic regulation of the human p53 gene promoter by the CTCF transcription factor in transformed cell lines
Original Article
Oncogene (2010) 29, 2217–2227; doi:10.1038/onc.2009.509; published online 25 January 2010
Epigenetic regulation of the human p53 gene promoter by the CTCF transcription factor in transformed cell lines
E Soto-Reyes1 and F Recillas-Targa1
1Instituto de Fisiología Celular, Departamento de Genética Molecular, Universidad Nacional Autónoma de México, México D.F., México
Correspondence: Dr F Recillas-Targa, Instituto de Fisiología Celular, Departamento de Genética Molecular, Universidad Nacional Autónoma de México, Apartado Postal 70-242, Mexico City, D.F., 04510, México. E-mail: frecilla@ifc.unam.mx
Received 18 May 2009; Revised 3 December 2009; Accepted 13 December 2009; Published online 25 January 2010.
Top of page
Abstract
Epigenetic silencing of tumor suppressor gene promoters has become a more frequent phenomenon in cancer than previously anticipated. In this study we addressed the mechanisms involved in the protection of the p53 tumor suppressor gene against epigenetic silencing in human transformed cell lines. We characterized a binding site for the CCCTC-binding factor (CTCF) in the human p53 gene promoter that contributes to its transcriptional expression, and has the ability to maintain this regulatory element in a local open chromatin configuration. In the absence of CTCF we observe the incorporation of repressive histone marks, such as H3K9me3, H3K27me3 and H4K20me3, in different sub-domains of the upstream regulatory sequence. This evidence suggests that CTCF protects the p53 gene promoter against repressive histone marks. Notably, no apparent direct correlation between repression and DNA hypermethylation has been detected. Together, we present evidence supporting the relevant role of CTCF in the epigenetic regulation of tumor suppressor genes and cancer. We propose that CTCF is a strategic component responsible for the maintenance and segregation of epigenetic traits.
Oncogene (2010) 29, 2217–2227; doi:10.1038/onc.2009.509; published online 25 January 2010
Epigenetic regulation of the human p53 gene promoter by the CTCF transcription factor in transformed cell lines
E Soto-Reyes1 and F Recillas-Targa1
1Instituto de Fisiología Celular, Departamento de Genética Molecular, Universidad Nacional Autónoma de México, México D.F., México
Correspondence: Dr F Recillas-Targa, Instituto de Fisiología Celular, Departamento de Genética Molecular, Universidad Nacional Autónoma de México, Apartado Postal 70-242, Mexico City, D.F., 04510, México. E-mail: frecilla@ifc.unam.mx
Received 18 May 2009; Revised 3 December 2009; Accepted 13 December 2009; Published online 25 January 2010.
Top of page
Abstract
Epigenetic silencing of tumor suppressor gene promoters has become a more frequent phenomenon in cancer than previously anticipated. In this study we addressed the mechanisms involved in the protection of the p53 tumor suppressor gene against epigenetic silencing in human transformed cell lines. We characterized a binding site for the CCCTC-binding factor (CTCF) in the human p53 gene promoter that contributes to its transcriptional expression, and has the ability to maintain this regulatory element in a local open chromatin configuration. In the absence of CTCF we observe the incorporation of repressive histone marks, such as H3K9me3, H3K27me3 and H4K20me3, in different sub-domains of the upstream regulatory sequence. This evidence suggests that CTCF protects the p53 gene promoter against repressive histone marks. Notably, no apparent direct correlation between repression and DNA hypermethylation has been detected. Together, we present evidence supporting the relevant role of CTCF in the epigenetic regulation of tumor suppressor genes and cancer. We propose that CTCF is a strategic component responsible for the maintenance and segregation of epigenetic traits.
Targeting of mRNAs by multiple miRNAs: the next step
Review
Oncogene (2010) 29, 2161–2164; doi:10.1038/onc.2010.59; published online 1 March 2010
Targeting of mRNAs by multiple miRNAs: the next step
M E Peter1
1The Ben May Department of Cancer Research, The University of Chicago, Chicago, IL, USA
Correspondence: Professor ME Peter, The Ben May Department of Cancer Research, The University of Chicago, 924 E 57th Street, R112, Chicago, IL 60637, USA. E-mail: MPeter@uchicago.edu
Received 25 January 2010; Accepted 26 January 2010; Published online 1 March 2010.
Top of page
Abstract
Micro(mi)RNAs are small noncoding RNAs that regulate expression of the majority of the genes in the genome at either the messenger RNA (mRNA) level (by degrading mRNA) or the protein level (by blocking translation). miRNAs are thought to be components of vast regulatory networks. Currently, the field is focused primarily on identifying novel targets of individual miRNAs. This focus is about to undergo a dramatic change. In a new paper by Wu et al. (2010) it is experimentally confirmed that multiple miRNAs target the same gene, suggesting that it is the combination of all these activities that determines the expression of miRNA target genes. This study ushers in a new era of miRNA research that focuses on networks more than on individual connections between miRNA and strongly predicted targets.
Oncogene (2010) 29, 2161–2164; doi:10.1038/onc.2010.59; published online 1 March 2010
Targeting of mRNAs by multiple miRNAs: the next step
M E Peter1
1The Ben May Department of Cancer Research, The University of Chicago, Chicago, IL, USA
Correspondence: Professor ME Peter, The Ben May Department of Cancer Research, The University of Chicago, 924 E 57th Street, R112, Chicago, IL 60637, USA. E-mail: MPeter@uchicago.edu
Received 25 January 2010; Accepted 26 January 2010; Published online 1 March 2010.
Top of page
Abstract
Micro(mi)RNAs are small noncoding RNAs that regulate expression of the majority of the genes in the genome at either the messenger RNA (mRNA) level (by degrading mRNA) or the protein level (by blocking translation). miRNAs are thought to be components of vast regulatory networks. Currently, the field is focused primarily on identifying novel targets of individual miRNAs. This focus is about to undergo a dramatic change. In a new paper by Wu et al. (2010) it is experimentally confirmed that multiple miRNAs target the same gene, suggesting that it is the combination of all these activities that determines the expression of miRNA target genes. This study ushers in a new era of miRNA research that focuses on networks more than on individual connections between miRNA and strongly predicted targets.
Biases in Illumina transcriptome sequencing caused by random hexamer priming
Biases in Illumina transcriptome sequencing caused by random hexamer priming
Kasper D. Hansen1,*, Steven E. Brenner2 and Sandrine Dudoit1,3
1Division of Biostatistics, School of Public Health, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720-7358, 2Department of Plant and Microbial Biology, UC Berkeley, 461 Koshland Hall, Berkeley, CA 94720-3102 and 3Department of Statistics, UC Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA
*To whom correspondence should be addressed. Tel: ; Fax: ; Email: khansen@stat.berkeley.edu
Received December 1, 2009. Revised March 16, 2010. Accepted March 17, 2010.
Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.
Kasper D. Hansen1,*, Steven E. Brenner2 and Sandrine Dudoit1,3
1Division of Biostatistics, School of Public Health, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720-7358, 2Department of Plant and Microbial Biology, UC Berkeley, 461 Koshland Hall, Berkeley, CA 94720-3102 and 3Department of Statistics, UC Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA
*To whom correspondence should be addressed. Tel: ; Fax: ; Email: khansen@stat.berkeley.edu
Received December 1, 2009. Revised March 16, 2010. Accepted March 17, 2010.
Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.
Biases in Illumina transcriptome sequencing caused by random hexamer priming
Biases in Illumina transcriptome sequencing caused by random hexamer priming
Kasper D. Hansen1,*, Steven E. Brenner2 and Sandrine Dudoit1,3
1Division of Biostatistics, School of Public Health, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720-7358, 2Department of Plant and Microbial Biology, UC Berkeley, 461 Koshland Hall, Berkeley, CA 94720-3102 and 3Department of Statistics, UC Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA
*To whom correspondence should be addressed. Tel: ; Fax: ; Email: khansen@stat.berkeley.edu
Received December 1, 2009. Revised March 16, 2010. Accepted March 17, 2010.
Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.
Kasper D. Hansen1,*, Steven E. Brenner2 and Sandrine Dudoit1,3
1Division of Biostatistics, School of Public Health, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720-7358, 2Department of Plant and Microbial Biology, UC Berkeley, 461 Koshland Hall, Berkeley, CA 94720-3102 and 3Department of Statistics, UC Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA
*To whom correspondence should be addressed. Tel: ; Fax: ; Email: khansen@stat.berkeley.edu
Received December 1, 2009. Revised March 16, 2010. Accepted March 17, 2010.
Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.
Small RNA class transition from siRNA/piRNA to miRNA during pre-implantation mouse development
Small RNA class transition from siRNA/piRNA to miRNA during pre-implantation mouse development
Yusuke Ohnishi1,2, Yasushi Totoki3, Atsushi Toyoda4, Toshiaki Watanabe5,6, Yasuhiro Yamamoto5,6, Katsushi Tokunaga2, Yoshiyuki Sakaki4, Hiroyuki Sasaki5,6 and Hirohiko Hohjoh1,*
1Department of Molecular Genetics, National Institute of Neuroscience, NCNP, Tokyo 187-8502, 2Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, 3Cancer Genomics Project, National Cancer Center Research Institute, Tokyo 104-0045, 4Sequence Technology Team, RIKEN Genomic Sciences Center, Yokohama 230-0045, 5Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Research Organization of Information and Systems and 6Department of Genetics, School of Life Science, The Graduate University for Advanced Studies, Mishima 411-8540, Japan
*To whom correspondence should be addressed. Tel: , ext. 5951; Fax: +81 42 346 1755; Email: hohjohh@ncnp.go.jp
Received February 9, 2010. Revised March 17, 2010. Accepted March 18, 2010.
Recent studies showed that small interfering RNAs (siRNAs) and Piwi-interacting RNA (piRNA) in mammalian germ cells play important roles in retrotransposon silencing and gametogenesis. However, subsequent contribution of those small RNAs to early mammalian development remains poorly understood. We investigated the expression profiles of small RNAs in mouse metaphase II oocytes, 8–16-cell stage embryos, blastocysts and the pluripotent inner cell mass (ICM) using high-throughput pyrosequencing. Here, we show that during pre-implantation development a major small RNA class changes from retrotransposon-derived small RNAs containing siRNAs and piRNAs to zygotically synthesized microRNAs (miRNAs). Some siRNAs and piRNAs are transiently upregulated and directed against specific retrotransposon classes. We also identified miRNAs expression profiles characteristic of the ICM and trophectoderm (TE) cells. Taken together, our current study reveals a major reprogramming of functional small RNAs during early mouse development from oocyte to blastocyst.
Yusuke Ohnishi1,2, Yasushi Totoki3, Atsushi Toyoda4, Toshiaki Watanabe5,6, Yasuhiro Yamamoto5,6, Katsushi Tokunaga2, Yoshiyuki Sakaki4, Hiroyuki Sasaki5,6 and Hirohiko Hohjoh1,*
1Department of Molecular Genetics, National Institute of Neuroscience, NCNP, Tokyo 187-8502, 2Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, 3Cancer Genomics Project, National Cancer Center Research Institute, Tokyo 104-0045, 4Sequence Technology Team, RIKEN Genomic Sciences Center, Yokohama 230-0045, 5Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Research Organization of Information and Systems and 6Department of Genetics, School of Life Science, The Graduate University for Advanced Studies, Mishima 411-8540, Japan
*To whom correspondence should be addressed. Tel: , ext. 5951; Fax: +81 42 346 1755; Email: hohjohh@ncnp.go.jp
Received February 9, 2010. Revised March 17, 2010. Accepted March 18, 2010.
Recent studies showed that small interfering RNAs (siRNAs) and Piwi-interacting RNA (piRNA) in mammalian germ cells play important roles in retrotransposon silencing and gametogenesis. However, subsequent contribution of those small RNAs to early mammalian development remains poorly understood. We investigated the expression profiles of small RNAs in mouse metaphase II oocytes, 8–16-cell stage embryos, blastocysts and the pluripotent inner cell mass (ICM) using high-throughput pyrosequencing. Here, we show that during pre-implantation development a major small RNA class changes from retrotransposon-derived small RNAs containing siRNAs and piRNAs to zygotically synthesized microRNAs (miRNAs). Some siRNAs and piRNAs are transiently upregulated and directed against specific retrotransposon classes. We also identified miRNAs expression profiles characteristic of the ICM and trophectoderm (TE) cells. Taken together, our current study reveals a major reprogramming of functional small RNAs during early mouse development from oocyte to blastocyst.
Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome
Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome
I. G. Mollet1,*, Claudia Ben-Dov2, Daniel Felício-Silva1, A. R. Grosso1, Pedro Eleutério1, Ruben Alves1, Ray Staller3, Tito Santos Silva4 and Maria Carmo-Fonseca1
1Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal, 2CRG-Centre de Regulació Genómica, Barcelona, Spain, 3Independent Senior Consultant on Statistics & Mathematics, Amsterdam, the Netherlands and 4Faculdade de Engenharia, Universidade Católica Portuguesa, Lisbon, Portugal
*To whom correspondence should be addressed. Tel: ; Fax: +46(0)40-391222; Email: ines.mollet@med.lu.se; inesmollet@yahoo.com
Received November 17, 2009. Revised February 22, 2010. Accepted March 9, 2010.
Mining massive amounts of transcript data for alternative splicing information is paramount to help understand how the maturation of RNA regulates gene expression. We developed an algorithm to cluster transcript data to annotated genes to detect unannotated splice variants. A higher number of alternatively spliced genes and isoforms were found compared to other alternative splicing databases. Comparison of human and mouse data revealed a marked increase, in human, of splice variants incorporating novel exons and retained introns. Previously unannotated exons were validated by tiling array expression data and shown to correspond preferentially to novel first exons. Retained introns were validated by tiling array and deep sequencing data. The majority of retained introns were shorter than 500 nt and had weak polypyrimidine tracts. A subset of retained introns matching small RNAs and displaying a high GC content suggests a possible coordination between splicing regulation and production of noncoding RNAs. Conservation of unannotated exons and retained introns was higher in horse, dog and cow than in rodents, and 64% of exon sequences were only found in primates. This analysis highlights previously bypassed alternative splice variants, which may be crucial to deciphering more complex pathways of gene regulation in human.
I. G. Mollet1,*, Claudia Ben-Dov2, Daniel Felício-Silva1, A. R. Grosso1, Pedro Eleutério1, Ruben Alves1, Ray Staller3, Tito Santos Silva4 and Maria Carmo-Fonseca1
1Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal, 2CRG-Centre de Regulació Genómica, Barcelona, Spain, 3Independent Senior Consultant on Statistics & Mathematics, Amsterdam, the Netherlands and 4Faculdade de Engenharia, Universidade Católica Portuguesa, Lisbon, Portugal
*To whom correspondence should be addressed. Tel: ; Fax: +46(0)40-391222; Email: ines.mollet@med.lu.se; inesmollet@yahoo.com
Received November 17, 2009. Revised February 22, 2010. Accepted March 9, 2010.
Mining massive amounts of transcript data for alternative splicing information is paramount to help understand how the maturation of RNA regulates gene expression. We developed an algorithm to cluster transcript data to annotated genes to detect unannotated splice variants. A higher number of alternatively spliced genes and isoforms were found compared to other alternative splicing databases. Comparison of human and mouse data revealed a marked increase, in human, of splice variants incorporating novel exons and retained introns. Previously unannotated exons were validated by tiling array expression data and shown to correspond preferentially to novel first exons. Retained introns were validated by tiling array and deep sequencing data. The majority of retained introns were shorter than 500 nt and had weak polypyrimidine tracts. A subset of retained introns matching small RNAs and displaying a high GC content suggests a possible coordination between splicing regulation and production of noncoding RNAs. Conservation of unannotated exons and retained introns was higher in horse, dog and cow than in rodents, and 64% of exon sequences were only found in primates. This analysis highlights previously bypassed alternative splice variants, which may be crucial to deciphering more complex pathways of gene regulation in human.
Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing
Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing
Ernesto Picardi1, David S. Horner2, Matteo Chiara2, Riccardo Schiavon3, Giorgio Valle3 and Graziano Pesole1,4,*
1Dipartimento di Biochimica e Biologia Molecolare ‘E. Quagliariello’, Università degli Studi di Bari, 70126 Bari, 2Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, 20133 Milano, 3CRIBI, Università degli Studi di Padova, viale G. Colombo 3, 35121 Padova and 4Istituto Tecnologie Biomediche del Consiglio Nazionale delle Ricerche, via Amendola 122/D, 70125 Bari, Italy
*To whom correspondence should be addressed. Tel: ; Fax: +39 080 544 3317; Email: graziano.pesole@biologia.uniba.it
Received September 7, 2009. Revised March 9, 2010. Accepted March 9, 2010.
RNA editing is a widespread post-transcriptional molecular phenomenon that can increase proteomic diversity, by modifying the sequence of completely or partially non-functional primary transcripts, through a variety of mechanistically and evolutionarily unrelated pathways. Editing by base substitution has been investigated in both animals and plants. However, conventional strategies based on directed Sanger sequencing are time-consuming and effectively preclude genome wide identification of RNA editing and assessment of partial and tissue-specific editing sites. In contrast, the high-throughput RNA-Seq approach allows the generation of a comprehensive landscape of RNA editing at the genome level. Short reads from Solexa/Illumina GA and ABI SOLiD platforms have been used to investigate the editing pattern in mitochondria of Vitis vinifera providing significant support for 401 C-to-U conversions in coding regions and an additional 44 modifications in non-coding RNAs. Moreover, 76% of all C-to-U conversions in coding genes represent partial RNA editing events and 28% of them were shown to be significantly tissue specific. Solexa/Illumina and SOLiD platforms showed different characteristics with respect to the specific issue of large-scale editing analysis, and the combined approach presented here reduces the false positive rate of discovery of editing events.
Ernesto Picardi1, David S. Horner2, Matteo Chiara2, Riccardo Schiavon3, Giorgio Valle3 and Graziano Pesole1,4,*
1Dipartimento di Biochimica e Biologia Molecolare ‘E. Quagliariello’, Università degli Studi di Bari, 70126 Bari, 2Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, 20133 Milano, 3CRIBI, Università degli Studi di Padova, viale G. Colombo 3, 35121 Padova and 4Istituto Tecnologie Biomediche del Consiglio Nazionale delle Ricerche, via Amendola 122/D, 70125 Bari, Italy
*To whom correspondence should be addressed. Tel: ; Fax: +39 080 544 3317; Email: graziano.pesole@biologia.uniba.it
Received September 7, 2009. Revised March 9, 2010. Accepted March 9, 2010.
RNA editing is a widespread post-transcriptional molecular phenomenon that can increase proteomic diversity, by modifying the sequence of completely or partially non-functional primary transcripts, through a variety of mechanistically and evolutionarily unrelated pathways. Editing by base substitution has been investigated in both animals and plants. However, conventional strategies based on directed Sanger sequencing are time-consuming and effectively preclude genome wide identification of RNA editing and assessment of partial and tissue-specific editing sites. In contrast, the high-throughput RNA-Seq approach allows the generation of a comprehensive landscape of RNA editing at the genome level. Short reads from Solexa/Illumina GA and ABI SOLiD platforms have been used to investigate the editing pattern in mitochondria of Vitis vinifera providing significant support for 401 C-to-U conversions in coding regions and an additional 44 modifications in non-coding RNAs. Moreover, 76% of all C-to-U conversions in coding genes represent partial RNA editing events and 28% of them were shown to be significantly tissue specific. Solexa/Illumina and SOLiD platforms showed different characteristics with respect to the specific issue of large-scale editing analysis, and the combined approach presented here reduces the false positive rate of discovery of editing events.
Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing
Genomics
Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing
Bin Wang1, Guangwu Guo2,3,4, Chao Wang1, Ying Lin1, Xiaoning Wang1, Mouming Zhao5, Yong Guo1, Minghui He1,4, Yong Zhang4,* and Li Pan1,*
1School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong 510006, 2Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, 3Graduate University of Chinese Academy Sciences, Beijing 100049, 4Beijing Genomics Institute at Shenzhen, Shenzhen 518000 and 5College of Light Industry and Food Sciences, South China University of Technology, Guangzhou, Guangdong 510641, China
*To whom correspondence should be addressed. Tel: ; Fax: +86 20 39380698; Email: btlipan@scut.edu.cn
Correspondence may also be addressed to Yong Zhang. Tel/Fax: +86 755 25274032; Email: zhangy@genomics.org.cn
Received November 21, 2009. Revised March 16, 2010. Accepted March 26, 2010.
Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.
Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing
Bin Wang1, Guangwu Guo2,3,4, Chao Wang1, Ying Lin1, Xiaoning Wang1, Mouming Zhao5, Yong Guo1, Minghui He1,4, Yong Zhang4,* and Li Pan1,*
1School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong 510006, 2Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, 3Graduate University of Chinese Academy Sciences, Beijing 100049, 4Beijing Genomics Institute at Shenzhen, Shenzhen 518000 and 5College of Light Industry and Food Sciences, South China University of Technology, Guangzhou, Guangdong 510641, China
*To whom correspondence should be addressed. Tel: ; Fax: +86 20 39380698; Email: btlipan@scut.edu.cn
Correspondence may also be addressed to Yong Zhang. Tel/Fax: +86 755 25274032; Email: zhangy@genomics.org.cn
Received November 21, 2009. Revised March 16, 2010. Accepted March 26, 2010.
Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.
First Time, Every Time: Nucleosomes at a Promoter Can Determine the Probability of Gene Activation
First Time, Every Time: Nucleosomes at a Promoter Can Determine the Probability of Gene Activation
Warren P. Voth1 and David J. Stillman1, Corresponding Author Contact Information, E-mail The Corresponding Author
1 Department of Pathology, University of Utah Health Sciences Center, Salt Lake City, UT 84112, USA
Available online 19 April 2010.
Transcription factor binding sites are found in either nucleosome-free or nucleosome-embedded locations, thus in vivo relationships between nucleosome position and gene activation are not fully understood. In this issue of Developmental Cell, Bai et al. show that binding sites located in nucleosome depleted regions guarantee high reliability, not amplitude, of promoter firing.
Article Outline
Main Text
References
Main Text
A nucleosome can block transcription factor binding, and global surveys show most promoters have upstream nucleosome depleted regions (NDRs), while others have nucleosomes that may block transcription factor binding. What are the effects of inhibitory nucleosomes on gene activation? In this issue of Developmental Cell, Bai et al. (2010) examine this question at a cell-cycle regulated promoter and show that a promoter NDR can guarantee that a gene is activated every cycle. Alternatively, nucleosome-occupied binding sites can create a bimodal pattern of activation that may require stricter conditions for activation.
The S. cerevisiae CLN2 gene encodes a G1 cyclin, and cyclin expression drives the G1/S transition (Skotheim et al., 2008). Bai et al. (2010) mapped positions of nucleosomes at the CLN2 promoter and found a nucleosome covering the TATA element and the transcription start site (Figure 1A). The NDR between nucleosomes −2 and −3 contains three binding sites for SBF, a G1-specific transcription factor. SBF is inactive in early G1, but phosphorylation of the Whi5 inhibitor by CDK1 ends this inhibition, allowing SBF to activate CLN2 ([Costanzo et al., 2004] and [de Bruin et al., 2004]). Studies with synchronized cells show that nucleosomes −1 and −2 are evicted transiently during the cell-cycle and that nucleosome eviction requires SBF and the FACT histone chaperone ([Bai et al., 2010] and [Takahata et al., 2009a]).
Warren P. Voth1 and David J. Stillman1, Corresponding Author Contact Information, E-mail The Corresponding Author
1 Department of Pathology, University of Utah Health Sciences Center, Salt Lake City, UT 84112, USA
Available online 19 April 2010.
Transcription factor binding sites are found in either nucleosome-free or nucleosome-embedded locations, thus in vivo relationships between nucleosome position and gene activation are not fully understood. In this issue of Developmental Cell, Bai et al. show that binding sites located in nucleosome depleted regions guarantee high reliability, not amplitude, of promoter firing.
Article Outline
Main Text
References
Main Text
A nucleosome can block transcription factor binding, and global surveys show most promoters have upstream nucleosome depleted regions (NDRs), while others have nucleosomes that may block transcription factor binding. What are the effects of inhibitory nucleosomes on gene activation? In this issue of Developmental Cell, Bai et al. (2010) examine this question at a cell-cycle regulated promoter and show that a promoter NDR can guarantee that a gene is activated every cycle. Alternatively, nucleosome-occupied binding sites can create a bimodal pattern of activation that may require stricter conditions for activation.
The S. cerevisiae CLN2 gene encodes a G1 cyclin, and cyclin expression drives the G1/S transition (Skotheim et al., 2008). Bai et al. (2010) mapped positions of nucleosomes at the CLN2 promoter and found a nucleosome covering the TATA element and the transcription start site (Figure 1A). The NDR between nucleosomes −2 and −3 contains three binding sites for SBF, a G1-specific transcription factor. SBF is inactive in early G1, but phosphorylation of the Whi5 inhibitor by CDK1 ends this inhibition, allowing SBF to activate CLN2 ([Costanzo et al., 2004] and [de Bruin et al., 2004]). Studies with synchronized cells show that nucleosomes −1 and −2 are evicted transiently during the cell-cycle and that nucleosome eviction requires SBF and the FACT histone chaperone ([Bai et al., 2010] and [Takahata et al., 2009a]).
Friday, April 16, 2010
Alternative splicing and evolution: diversification, exon definition and function
Review
Nature Reviews Genetics 11, 345-355 (May 2010) | doi:10.1038/nrg2776
Alternative splicing and evolution: diversification, exon definition and function
Hadas Keren1, Galit Lev-Maor1 & Gil Ast1 About the authors
Top of page
Abstract
Over the past decade, it has been shown that alternative splicing (AS) is a major mechanism for the enhancement of transcriptome and proteome diversity, particularly in mammals. Splicing can be found in species from bacteria to humans, but its prevalence and characteristics vary considerably. Evolutionary studies are helping to address questions that are fundamental to understanding this important process: how and when did AS evolve? Which AS events are functional? What are the evolutionary forces that shaped, and continue to shape, AS? And what determines whether an exon is spliced in a constitutive or alternative manner? In this Review, we summarize the current knowledge of AS and evolution and provide insights into some of these unresolved questions.
Nature Reviews Genetics 11, 345-355 (May 2010) | doi:10.1038/nrg2776
Alternative splicing and evolution: diversification, exon definition and function
Hadas Keren1, Galit Lev-Maor1 & Gil Ast1 About the authors
Top of page
Abstract
Over the past decade, it has been shown that alternative splicing (AS) is a major mechanism for the enhancement of transcriptome and proteome diversity, particularly in mammals. Splicing can be found in species from bacteria to humans, but its prevalence and characteristics vary considerably. Evolutionary studies are helping to address questions that are fundamental to understanding this important process: how and when did AS evolve? Which AS events are functional? What are the evolutionary forces that shaped, and continue to shape, AS? And what determines whether an exon is spliced in a constitutive or alternative manner? In this Review, we summarize the current knowledge of AS and evolution and provide insights into some of these unresolved questions.
Friday, April 9, 2010
Molecular basis of S100 proteins interacting with the p53 homologs p63 and p73
Oncogene (2010) 29, 2024–2035; doi:10.1038/onc.2009.490; published online 8 February 2010
Molecular basis of S100 proteins interacting with the p53 homologs p63 and p73
J van Dieck1, T Brandt1, D P Teufel1, D B Veprintsev1, A C Joerger1 and A R Fersht1
1MRC Centre for Protein Engineering, Hills Road, Cambridge, UK
Correspondence: Professor AR Fersht, MRC Centre for Protein Engineering, Cambridge University, Hills Road, Cambridge, Cambs CB2 0QH, UK. E-mail: arf25@cam.ac.uk
Received 6 August 2009; Revised 16 October 2009; Accepted 27 October 2009; Published online 8 February 2010.
Top of page
Abstract
S100 proteins modulate p53 activity by interacting with its tetramerization (p53TET, residues 325–355) and transactivation (residues 1–57) domains. In this study, we characterized biophysically the binding of S100A1, S100A2, S100A4, S100A6 and S100B to homologous domains of p63 and p73 in vitro by fluorescence anisotropy, analytical ultracentrifugation and analytical gel filtration. We found that S100A1, S100A2, S100A4, S100A6 and S100B proteins bound different p63 and p73 tetramerization domain variants and naturally occurring isoforms with varying affinities in a calcium-dependent manner. Additional interactions were observed with peptides derived from the p63 and p73 N-terminal transactivation domains. Importantly, S100 proteins bound p63 and p73 with different affinities in their different oligomeric states, similarly to the differential modes of binding to p53. On the basis of our data, we hypothesize that S100 proteins regulate the oligomerization state of all three p53 family members and their isoforms, with a potential physiological relevance in developmental and disease-related processes. The regulation of the p53 family by S100 is complicated and depends on the target preference of each individual S100 protein, the concentration of the proteins and calcium, as well as the splicing variation of p63 or p73. Our results outlining the complexity of the interaction should be considered when studying the functional effects of S100 proteins in their biological context.
Keywords:
S100; p63; p73; tumor suppressor; protein–protein interaction
Molecular basis of S100 proteins interacting with the p53 homologs p63 and p73
J van Dieck1, T Brandt1, D P Teufel1, D B Veprintsev1, A C Joerger1 and A R Fersht1
1MRC Centre for Protein Engineering, Hills Road, Cambridge, UK
Correspondence: Professor AR Fersht, MRC Centre for Protein Engineering, Cambridge University, Hills Road, Cambridge, Cambs CB2 0QH, UK. E-mail: arf25@cam.ac.uk
Received 6 August 2009; Revised 16 October 2009; Accepted 27 October 2009; Published online 8 February 2010.
Top of page
Abstract
S100 proteins modulate p53 activity by interacting with its tetramerization (p53TET, residues 325–355) and transactivation (residues 1–57) domains. In this study, we characterized biophysically the binding of S100A1, S100A2, S100A4, S100A6 and S100B to homologous domains of p63 and p73 in vitro by fluorescence anisotropy, analytical ultracentrifugation and analytical gel filtration. We found that S100A1, S100A2, S100A4, S100A6 and S100B proteins bound different p63 and p73 tetramerization domain variants and naturally occurring isoforms with varying affinities in a calcium-dependent manner. Additional interactions were observed with peptides derived from the p63 and p73 N-terminal transactivation domains. Importantly, S100 proteins bound p63 and p73 with different affinities in their different oligomeric states, similarly to the differential modes of binding to p53. On the basis of our data, we hypothesize that S100 proteins regulate the oligomerization state of all three p53 family members and their isoforms, with a potential physiological relevance in developmental and disease-related processes. The regulation of the p53 family by S100 is complicated and depends on the target preference of each individual S100 protein, the concentration of the proteins and calcium, as well as the splicing variation of p63 or p73. Our results outlining the complexity of the interaction should be considered when studying the functional effects of S100 proteins in their biological context.
Keywords:
S100; p63; p73; tumor suppressor; protein–protein interaction
De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
Valentina Boeva1,2,3,4, Didier Surdez1,2, Noëlle Guillon1,2, Franck Tirode1,2, Anthony P. Fejes5, Olivier Delattre1,2 and Emmanuel Barillot1,3,4,*
1Institut Curie, 26 rue d’Ulm, 2INSERM, U830, Genetics and Biology of Cancer, 3INSERM, U900, Bioinformatics, Biostatistics, Epidemiology and Computational Systems Biology of Cancer, Paris, F-75248, 4Mines ParisTech, Fontainebleau, F-77300, France and 5Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
*To whom correspondence should be addressed. Tel: ; Fax: +33 1 56 24 69 11; Email: micsa@curie.fr
Received November 10, 2009. Revised February 23, 2010. Accepted March 15, 2010.
Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to ~150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression—positively as often as negatively—and at much larger distances (up to ~1 Mb).
Valentina Boeva1,2,3,4, Didier Surdez1,2, Noëlle Guillon1,2, Franck Tirode1,2, Anthony P. Fejes5, Olivier Delattre1,2 and Emmanuel Barillot1,3,4,*
1Institut Curie, 26 rue d’Ulm, 2INSERM, U830, Genetics and Biology of Cancer, 3INSERM, U900, Bioinformatics, Biostatistics, Epidemiology and Computational Systems Biology of Cancer, Paris, F-75248, 4Mines ParisTech, Fontainebleau, F-77300, France and 5Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada
*To whom correspondence should be addressed. Tel: ; Fax: +33 1 56 24 69 11; Email: micsa@curie.fr
Received November 10, 2009. Revised February 23, 2010. Accepted March 15, 2010.
Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to ~150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression—positively as often as negatively—and at much larger distances (up to ~1 Mb).
Detection of splice junctions from paired-end RNA-seq data by SpliceMap
Detection of splice junctions from paired-end RNA-seq data by SpliceMap
Kin Fai Au1, Hui Jiang1,2, Lan Lin3, Yi Xing3 and Wing Hung Wong1,*
1Department of Statistics, Stanford University, Stanford, CA 94305, 2Stanford Genome Technology Center, 855 California Ave, Palo Alto, CA 94304 and 3Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA, 52242, USA
*To whom correspondence should be addressed. Tel: ; Fax: +1 650 725 8977; Email: whwong@stanford.edu
Received December 7, 2009. Revised March 10, 2010. Accepted March 12, 2010.
Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50–100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity.
Kin Fai Au1, Hui Jiang1,2, Lan Lin3, Yi Xing3 and Wing Hung Wong1,*
1Department of Statistics, Stanford University, Stanford, CA 94305, 2Stanford Genome Technology Center, 855 California Ave, Palo Alto, CA 94304 and 3Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA, 52242, USA
*To whom correspondence should be addressed. Tel: ; Fax: +1 650 725 8977; Email: whwong@stanford.edu
Received December 7, 2009. Revised March 10, 2010. Accepted March 12, 2010.
Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50–100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity.
A Signal-Noise Model for Significance Analysis of ChIP-seq with Negative Control
A Signal-Noise Model for Significance Analysis of ChIP-seq with Negative Control
Han Xu 1,3, Lusy Handoko 2, Xueliang Wei 4, Chaopeng Ye 2, Jianpeng Sheng 5, Chia-Lin Wei 2, Feng Lin 3,* and Wing-Kin Sung 1,4,*
1Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore; 2Genome Technology & Biology Group, Genome Institute of Singapore, 138672, Singapore; 3School of Computer Engineering, Nanyang Technological University, 637553, Singapore; 4School of Computing, National University of Singapore, 117543, Singapore; 5School of Biological Science, Nanyang Techno-logical University, 637551, Singapore
*To whom correspondence should be addressed. Feng Lin, Wing-Kin Sung, E-mail: sungk@gis.a-star.edu.sg, asflin@ntu.edu.sg
Abstract
Motivation: ChIP-seq is becoming the main approach to the genome-wide study of protein-DNA interactions and histone modifications. Existing informatics tools perform well to extract strong ChIP-enriched sites. However, two questions remain to be answered: (a) to which extent is a ChIP-seq experiment able to reveal the weak ChIP-enriched sites? (b) are the weak sites biologically meaningful? To answer these questions, it is necessary to identify the weak ChIP signals from background noise.
Results: We propose a linear signal-noise model, in which a noise rate was introduced to represent the fraction of noise in a ChIP library. We developed an iterative algorithm to estimate the noise rate using a control library, and derived a library-swapping strategy for the FDR estimation. These approaches were integrated in a general-purpose framework, named CCAT (Control based ChIP-seq Analysis Tool), for the significance analysis of ChIP-seq. Applications to H3K4me3 and H3K36me3 datasets showed CCAT predicted significantly more ChIP-enriched sites than previous methods did. With the high sensitivity of CCAT prediction, we revealed distinct chromatin features associated to the strong and weak H3K4me3 sites.
Availability: http://cmb.gis.a-star.edu.sg/ChIPSeq/tools.htm
Han Xu 1,3, Lusy Handoko 2, Xueliang Wei 4, Chaopeng Ye 2, Jianpeng Sheng 5, Chia-Lin Wei 2, Feng Lin 3,* and Wing-Kin Sung 1,4,*
1Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore; 2Genome Technology & Biology Group, Genome Institute of Singapore, 138672, Singapore; 3School of Computer Engineering, Nanyang Technological University, 637553, Singapore; 4School of Computing, National University of Singapore, 117543, Singapore; 5School of Biological Science, Nanyang Techno-logical University, 637551, Singapore
*To whom correspondence should be addressed. Feng Lin, Wing-Kin Sung, E-mail: sungk@gis.a-star.edu.sg, asflin@ntu.edu.sg
Abstract
Motivation: ChIP-seq is becoming the main approach to the genome-wide study of protein-DNA interactions and histone modifications. Existing informatics tools perform well to extract strong ChIP-enriched sites. However, two questions remain to be answered: (a) to which extent is a ChIP-seq experiment able to reveal the weak ChIP-enriched sites? (b) are the weak sites biologically meaningful? To answer these questions, it is necessary to identify the weak ChIP signals from background noise.
Results: We propose a linear signal-noise model, in which a noise rate was introduced to represent the fraction of noise in a ChIP library. We developed an iterative algorithm to estimate the noise rate using a control library, and derived a library-swapping strategy for the FDR estimation. These approaches were integrated in a general-purpose framework, named CCAT (Control based ChIP-seq Analysis Tool), for the significance analysis of ChIP-seq. Applications to H3K4me3 and H3K36me3 datasets showed CCAT predicted significantly more ChIP-enriched sites than previous methods did. With the high sensitivity of CCAT prediction, we revealed distinct chromatin features associated to the strong and weak H3K4me3 sites.
Availability: http://cmb.gis.a-star.edu.sg/ChIPSeq/tools.htm
Thursday, April 8, 2010
Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate
Published online before print April 7, 2010 as doi: 10.1096/fj.10-154484.
Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate gene, RORA, whose protein product is reduced in autistic brain
AnhThu Nguyen, Tibor A. Rauch, Gerd P. Pfeifer, and Valerie W. Hu
E-mail contact: bcmvwh@gwumc.edu
Autism is currently considered a multigene disorder with epigenetic influences. To investigate the contribution of DNA methylation to autism spectrum disorders, we have recently completed large-scale methylation profiling by CpG island microarray analysis of lymphoblastoid cell lines derived from monozygotic twins discordant for diagnosis of autism and their nonautistic siblings. Methylation profiling revealed many candidate genes differentially methylated between discordant MZ twins as well as between both twins and nonautistic siblings. Bioinformatics analysis of the differentially methylated genes demonstrated enrichment for high-level functions including gene transcription, nervous system development, cell death/survival, and other biological processes implicated in autism. The methylation status of 2 of these candidate genes, BCL-2 and retinoic acid-related orphan receptor alpha (RORA), was further confirmed by bisulfite sequencing and methylation-specific PCR, respectively. Immunohistochemical analyses of tissue arrays containing slices of the cerebellum and frontal cortex of autistic and age- and sex-matched control subjects revealed decreased expression of RORA and BCL-2 proteins in the autistic brain. Our data thus confirm the role of epigenetic regulation of gene expression via differential DNA methylation in idiopathic autism, and furthermore link molecular changes in a peripheral cell model with brain pathobiology in autism.—Nguyen, A., Rauch, T. A., Pfeifer, G. P., Hu, V. W. Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate gene, RORA, whose protein product is reduced in autistic brain.
Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate gene, RORA, whose protein product is reduced in autistic brain
AnhThu Nguyen, Tibor A. Rauch, Gerd P. Pfeifer, and Valerie W. Hu
E-mail contact: bcmvwh@gwumc.edu
Autism is currently considered a multigene disorder with epigenetic influences. To investigate the contribution of DNA methylation to autism spectrum disorders, we have recently completed large-scale methylation profiling by CpG island microarray analysis of lymphoblastoid cell lines derived from monozygotic twins discordant for diagnosis of autism and their nonautistic siblings. Methylation profiling revealed many candidate genes differentially methylated between discordant MZ twins as well as between both twins and nonautistic siblings. Bioinformatics analysis of the differentially methylated genes demonstrated enrichment for high-level functions including gene transcription, nervous system development, cell death/survival, and other biological processes implicated in autism. The methylation status of 2 of these candidate genes, BCL-2 and retinoic acid-related orphan receptor alpha (RORA), was further confirmed by bisulfite sequencing and methylation-specific PCR, respectively. Immunohistochemical analyses of tissue arrays containing slices of the cerebellum and frontal cortex of autistic and age- and sex-matched control subjects revealed decreased expression of RORA and BCL-2 proteins in the autistic brain. Our data thus confirm the role of epigenetic regulation of gene expression via differential DNA methylation in idiopathic autism, and furthermore link molecular changes in a peripheral cell model with brain pathobiology in autism.—Nguyen, A., Rauch, T. A., Pfeifer, G. P., Hu, V. W. Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate gene, RORA, whose protein product is reduced in autistic brain.
Thursday, April 1, 2010
Epigenetic marks identify functional elements
Epigenetic marks identify functional elements
*
Randall H Morse
Journal name:
Nature Genetics
Volume:
42,
Pages:
282–284
Year published:
(2010)
DOI:
doi:10.1038/ng0410-282
Enhancers and transcription factor binding sites that control cell-specific transcription in higher eukaryotes can be found up to hundreds of kilobases from the promoters that they control, making their identification challenging. A new study uses a model based on histone modifications and chromatin dynamics to predict functional elements involved in androgen receptor response.
*
Randall H Morse
Journal name:
Nature Genetics
Volume:
42,
Pages:
282–284
Year published:
(2010)
DOI:
doi:10.1038/ng0410-282
Enhancers and transcription factor binding sites that control cell-specific transcription in higher eukaryotes can be found up to hundreds of kilobases from the promoters that they control, making their identification challenging. A new study uses a model based on histone modifications and chromatin dynamics to predict functional elements involved in androgen receptor response.
Subscribe to:
Posts (Atom)