Friday, May 28, 2010

Mouse project to find each gene's role

Published online 25 May 2010 | Nature 465, 410 (2010) | doi:10.1038/465410a

Mouse project to find each gene's role

International Mouse Phenotyping Consortium launches with a massive funding commitment.

Alison Abbott
Making thousands of strains of knockout mice will build a powerful database.Image courtesy of MRC Harwell

An ambitious effort to identify the function of every gene in the mouse genome looks set to provide scientists with the ultimate mouse model of human disease.

The International Mouse Phenotyping Consortium (IMPC) has yet to find all of the US$900 million it needs to complete its task over the next decade. But at a meeting on mouse models of human diseases in London, where the project was unveiled last week, scientists announced a commitment of $110 million from the US National Institutes of Health (NIH) in Bethesda, Maryland over the next five years.

"The project will explain the genes — it is going to be transformative for biology," says James Battey, director of the National Institute on Deafness and Other Communication Disorders. This is one of the nine NIH institutes that have agreed to set aside $11 million from their budgets to match the $11 million put up by Francis Collins from his own budget as NIH director.

The IMPC aims to take mice of identical genetic background and to create viable strains in which one of the 20,000 or so genes in the mouse genome is knocked out, or deactivated. The knockout strains will then be put through rigorous, systematic phenotypic screens, which will check for physical and behavioural differences. The information will be stored in a purpose-built, open-access database.

Scientists would, for example, be able to turn to the database to learn more about an unfamiliar gene signalled in a genome-wide association study in humans as being possibly relevant to a particular disease. Making the mutant animal and phenotyping it in a lab could take three years.

According to Battey, the NIH thinks that its investment in the large-scale programme will actually save money in the long run, because it already spends "hundreds of millions of dollars per year" supporting small-scale grants for individual scientists wanting to create and phenotype their own particular gene of interest.

“No question, the mouse is where the action is.”

But the launch of the visionary programme comes at a time of global financial crisis and may have trouble finding additional funders. Some scientists had looked to the European Commission, which sponsored the meeting and has spent close to €250 million (US$305 million) over the past ten years to pioneer systematic phenotyping of mice, and to generate mutant mice. But Leszek Borysiewicz, chief executive of the UK Medical Research Council (MRC) in London, warned at the meeting that the commission's politicians would need a lot of con­vincing that mouse genomics was more deserving of funds than other scientific projects.

Scientists at the meeting, however, argue that the IMPC's goals will eventually be achieved, even if the timetable has to be stretched, simply because the work is so vital. The mouse genome was the first to be sequenced after the human because of its importance as a laboratory model. "But it soon became clear that it is impossible to predict function directly from sequence," says Paul Schofield, a geneticist at the Uni­versity of Cambridge, UK, who helped to organize the meeting. "Also, there are black holes in the genomes where we simply don't know what the genes do — the mouse phenotype database would give us traction."

Mouse 'clinics' have sprung up around the world to screen mutant mice for crude phenotypes — such as heart defects — and to perform secondary screens to investigate the changes in more detail. But it is already clear that even this is not enough. Gene expression, and the resultant phenotype, are profoundly influenced by environment, and many of these mice are raised in a stressed environment for the purposes of experiments — for example, being fed high-fat diets or subjected to infection. Much more will be learnt by comparing phenotype screens carried out on mutant mice raised in a normal environment, say project scientists.

"It makes sense to coordinate the systematic phenotyping of the complete genome internationally, like we did the human and mouse genome," says geneticist Mark Moore, a consultant for the NIH, the MRC and the Wellcome Trust UK medical funding charity, who heads the IMPC. The consortium, which will involve clinical as well as basic research scientists, will do primary phenotypic screens on 4,000 knockout mice in its first five years. Any researcher may request a particular gene to be prioritized for knocking out in this first phase, although a proportion will also be selected at random.

The IMPC is already discussing with the scientific community exactly what types of phenotyping should be carried out, and what environmental challenges the mice should face in later stages of the project. It is also considering including a colony of ageing mice: "This would increase costs because the mice would have to be housed for so long, but age-related diseases are at the top of everyone's agendas," says Moore.
Source of drugs

"The IMPC sounds expensive but it is not compared with other genomic resources," adds Moore. "The database only needs to help industry to develop a handful of multibillion-dollar blockbuster drugs and it will have paid for itself."

Take part in our brief editorial survey

"The initiative will save us time and money, which will help us provide drugs in return," agrees David Frendewey, an associate director of Regeneron Pharmaceuticals, headquartered in Tarry­town, New York. "No question, the mouse is where the action is." Yet turning drug companies' interest into hard currency won't be easy. "Industry will participate in kind, but maybe not with cash — we are no longer a bottom­less pit of money," says Malcolm Skingle, director of academic liaison at GlaxoSmith­Kline in Stevenage, UK.

And genome engineer Francis Stewart of the Technical University of Dresden, Germany, says that focusing on the commercial benefits of the project misses the point. "Through the IMPC, we have a historic opportunity to systematically learn everything about a mammal, for the first time," he says. "Let's not lose the vision by side-tracking into applications issues just to please politicians.

Tuesday, May 11, 2010

Genomics goes beyond DNA sequence

Genomics goes beyond DNA sequence

A technology that simultaneously reads a DNA sequence and its crucial modifications makes its debut.

Alla Katsnelson

What makes two individuals different? Biologists now know that the genome sequence holds only a small part of the answer, and that key elements of development and disease are controlled by the epigenome — a set of chemical modifications, not encoded in DNA, that orchestrate how and when genes are expressed. But whereas faster, cheaper and more accurate sequencing technologies have developed rapidly, techniques to map the epigenome have lagged behind.
DNA polymerase (shown flanking the double helix) can reveal genomic and epigenomic detail.LAGUNA DESIGN/SPL

Sequencing company Pacific Biosciences, based in Menlo Park, California, has now developed an integrated system that simultaneously reads a genome sequence and detects an important epigenetic marker called DNA methylation. "I think it's an important step forward, although I think it is a baby step," says Joseph Ecker, a plant geneticist at the Salk Institute for Biological Studies in La Jolla, California, who was not involved in the work.

DNA methylation — the addition of methyl groups to individual bases — is just one of many epigenetic markers of DNA and its associated proteins. Others include modification of the histone proteins that DNA winds around to form chromatin — the tightly packed cluster that makes up chromosomes — and the activation of small non-coding RNA molecules.

DNA methylation, which reduces gene expression, is linked to key developmental events, as well as many types of cancer. It is the best-studied epigenetic modification, mainly because tools have existed to study it, says Susan Clark, an epigeneticist at the Garvan Institute of Medical Research in Sydney, Australia.

The gold-standard method for detecting DNA methylation, which Clark's group developed more than 15 years ago, is bisulphite sequencing, in which unmethylated versions of the base cytosine are chemically converted into another base, uracil. Sequencing the converted DNA allows scientists to reconstruct a genome-wide methylation map. But the technique has several drawbacks. Not only is it expensive and time consuming, it also damages DNA, reducing the map's accuracy. And it doesn't detect methylation at adenine bases, which are very prevalent in organisms such as bacteria.

Pacific Biosciences' approach for detecting DNA methylation, published this month in Nature Methods1, builds on the company's sequencing technology. The system uses an enzyme called DNA polymerase to read a strand of DNA and build a complementary strand out of nucleo­tides labelled with fluorescent mol­ecules. As each component is added to the growing strand, it produces a flash of light — the colour of the light corresponds to the identity of the base, and thus reveals the sequence of the template DNA.

Analysing the pulses of light, and the time between them, can also show whether methylation is affecting polymerase activity. This has now been exploited to detect methyl­adenine, methylcytosine and a poorly understood modification called 5-hydroxymethylcytosine. "We foresee with this technology that in the future there will be a unification of the fields of epigenomics and genomics," says Stephen Turner, the company's founder and chief technology officer.
Game changer?

Although the data are promising, obstacles remain. "There are distinct advantages, but we're not rushing out tomorrow to apply this because it's not prime time for human methyl­ome mapping," says Ecker.

One problem is that although the technique is great at distinguishing adenine from methyl­adenine, it doesn't quite reach single-base resolution for cytosine and methylcytosine. It also lacks one of the key promised benefits of Pacific Biosciences' sequencing technology: its ability to read long sequences of DNA, up to 8,000–10,000 base pairs, which makes it easier to assemble the data into complete genomes. Instead, the reported methylation read-length is only about 1,000 base pairs.

“This is exactly the technology you could use to look for epigenomic changes in specific cell types.”

Turner says that the company is working to solve these problems. It will ship the first sequencers that use fluorescent labelling this year, and plans to add the methylation mapping capability next year.

"What needs to be done now is to make it robust and accurate," says Clark, a steering-committee member of the Inter­national Human Epigenome Consortium, a bid launched in January to map the epigenome in multiple cell types2. "There's a lot of trouble­shooting that needs to be done to get it to be accurate enough to be able to compete with bisulphite sequencing."

Several companies are working on similar technologies. UK-based Oxford Nanopore Technologies published a report last year showing that it could detect methylated DNA at a single-molecule level3. But that system and others are still at an earlier stage of development.

Some say that the promise for such a technique is huge. Bisulphite sequencing for a single human genome can cost up to US$100,000, says Robert Martienssen, a geneticist at Cold Spring Harbor Laboratory in New York. With the latest technique, the cost of a full-genome methylation map would drop to $100–1,000, he says. "That will change everything."

There is no shortage of epigenetic questions ripe for probing. One example is in tumour biology, where different cancer cells are likely to have different methylation patterns. Another is how cells in a single organism take on different functions despite having identical genomes. "This is exactly the technology you could use to look for epigenomic changes in specific cell types," says Martienssen, who is also on the International Human Epigenome Consortium's steering committee.

Ecker says researchers still haven't pinned down the significance of, say, having a methylation mark in one position and not another, and what's really needed is more studies that unify genomic and epigenomic information. "As you get more genomes to compare, then of course the differences take on some meaning," he says. "We're just lacking numbers at this point."

Tuesday, April 27, 2010

Model-based method for transcription factor target identification with limited data

Model-based method for transcription factor target identification with limited data

1. Antti Honkelaa,1,
2. Charles Girardotb,
3. E. Hilary Gustafsonb,
4. Ya-Hsin Liub,
5. Eileen E. M. Furlongb,
6. Neil D. Lawrencec,1, and
7. Magnus Rattrayc,1

+ Author Affiliations

aDepartment of Information and Computer Science, Aalto University School of Science and Technology, Helsinki, Finland;
bGenome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany; and
cSchool of Computer Science, University of Manchester, Manchester, United Kingdom


Edited by David Baker, University of Washington, Seattle, WA, and approved March 3, 2010 (received for review December 10, 2009)


We present a computational method for identifying potential targets of a transcription factor (TF) using wild-type gene expression time series data. For each putative target gene we fit a simple differential equation model of transcriptional regulation, and the model likelihood serves as a score to rank targets. The expression profile of the TF is modeled as a sample from a Gaussian process prior distribution that is integrated out using a nonparametric Bayesian procedure. This results in a parsimonious model with relatively few parameters that can be applied to short time series datasets without noticeable overfitting. We assess our method using genome-wide chromatin immunoprecipitation (ChIP-chip) and loss-of-function mutant expression data for two TFs, Twist, and Mef2, controlling mesoderm development in Drosophila. Lists of top-ranked genes identified by our method are significantly enriched for genes close to bound regions identified in the ChIP-chip data and for genes that are differentially expressed in loss-of-function mutants. Targets of Twist display diverse expression profiles, and in this case a model-based approach performs significantly better than scoring based on correlation with TF expression. Our approach is found to be comparable or superior to ranking based on mutant differential expression scores. Also, we show how integrating complementary wild-type spatial expression data can further improve target ranking performance.

Saturday, April 24, 2010

CpG islands influence chromatin structure via the CpG-binding protein Cfp1


Nature 464, 1082-1086 (15 April 2010) | doi:10.1038/nature08924; Received 2 October 2009; Accepted 15 February 2010

CpG islands influence chromatin structure via the CpG-binding protein Cfp1

John P. Thomson1,3, Peter J. Skene1,3, Jim Selfridge1, Thomas Clouaire1, Jacky Guy1, Shaun Webb1, Alastair R. W. Kerr1, Aimée Deaton1, Rob Andrews2, Keith D. James2, Daniel J. Turner2, Robert Illingworth1 & Adrian Bird1

1. Wellcome Trust Centre for Cell Biology, Michael Swann Building, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, UK
2. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
3. These authors contributed equally to this work.

Correspondence to: Adrian Bird1 Correspondence and requests for materials should be addressed to A.B. (Email:

Top of page

CpG islands (CGIs) are prominent in the mammalian genome owing to their GC-rich base composition and high density of CpG dinucleotides1, 2. Most human gene promoters are embedded within CGIs that lack DNA methylation and coincide with sites of histone H3 lysine 4 trimethylation (H3K4me3), irrespective of transcriptional activity3, 4. In spite of these intriguing correlations, the functional significance of non-methylated CGI sequences with respect to chromatin structure and transcription is unknown. By performing a search for proteins that are common to all CGIs, here we show high enrichment for Cfp1, which selectively binds to non-methylated CpGs in vitro5, 6. Chromatin immunoprecipitation of a mono-allelically methylated CGI confirmed that Cfp1 specifically associates with non-methylated CpG sites in vivo. High throughput sequencing of Cfp1-bound chromatin identified a notable concordance with non-methylated CGIs and sites of H3K4me3 in the mouse brain. Levels of H3K4me3 at CGIs were markedly reduced in Cfp1-depleted cells, consistent with the finding that Cfp1 associates with the H3K4 methyltransferase Setd1 (refs 7, 8). To test whether non-methylated CpG-dense sequences are sufficient to establish domains of H3K4me3, we analysed artificial CpG clusters that were integrated into the mouse genome. Despite the absence of promoters, the insertions recruited Cfp1 and created new peaks of H3K4me3. The data indicate that a primary function of non-methylated CGIs is to genetically influence the local chromatin modification state by interaction with Cfp1 and perhaps other CpG-binding proteins.

Tuesday, April 20, 2010

Epigenetic regulation of the human p53 gene promoter by the CTCF transcription factor in transformed cell lines

Original Article

Oncogene (2010) 29, 2217–2227; doi:10.1038/onc.2009.509; published online 25 January 2010
Epigenetic regulation of the human p53 gene promoter by the CTCF transcription factor in transformed cell lines

E Soto-Reyes1 and F Recillas-Targa1

1Instituto de Fisiología Celular, Departamento de Genética Molecular, Universidad Nacional Autónoma de México, México D.F., México

Correspondence: Dr F Recillas-Targa, Instituto de Fisiología Celular, Departamento de Genética Molecular, Universidad Nacional Autónoma de México, Apartado Postal 70-242, Mexico City, D.F., 04510, México. E-mail:

Received 18 May 2009; Revised 3 December 2009; Accepted 13 December 2009; Published online 25 January 2010.
Top of page

Epigenetic silencing of tumor suppressor gene promoters has become a more frequent phenomenon in cancer than previously anticipated. In this study we addressed the mechanisms involved in the protection of the p53 tumor suppressor gene against epigenetic silencing in human transformed cell lines. We characterized a binding site for the CCCTC-binding factor (CTCF) in the human p53 gene promoter that contributes to its transcriptional expression, and has the ability to maintain this regulatory element in a local open chromatin configuration. In the absence of CTCF we observe the incorporation of repressive histone marks, such as H3K9me3, H3K27me3 and H4K20me3, in different sub-domains of the upstream regulatory sequence. This evidence suggests that CTCF protects the p53 gene promoter against repressive histone marks. Notably, no apparent direct correlation between repression and DNA hypermethylation has been detected. Together, we present evidence supporting the relevant role of CTCF in the epigenetic regulation of tumor suppressor genes and cancer. We propose that CTCF is a strategic component responsible for the maintenance and segregation of epigenetic traits.

Targeting of mRNAs by multiple miRNAs: the next step


Oncogene (2010) 29, 2161–2164; doi:10.1038/onc.2010.59; published online 1 March 2010
Targeting of mRNAs by multiple miRNAs: the next step

M E Peter1

1The Ben May Department of Cancer Research, The University of Chicago, Chicago, IL, USA

Correspondence: Professor ME Peter, The Ben May Department of Cancer Research, The University of Chicago, 924 E 57th Street, R112, Chicago, IL 60637, USA. E-mail:

Received 25 January 2010; Accepted 26 January 2010; Published online 1 March 2010.
Top of page

Micro(mi)RNAs are small noncoding RNAs that regulate expression of the majority of the genes in the genome at either the messenger RNA (mRNA) level (by degrading mRNA) or the protein level (by blocking translation). miRNAs are thought to be components of vast regulatory networks. Currently, the field is focused primarily on identifying novel targets of individual miRNAs. This focus is about to undergo a dramatic change. In a new paper by Wu et al. (2010) it is experimentally confirmed that multiple miRNAs target the same gene, suggesting that it is the combination of all these activities that determines the expression of miRNA target genes. This study ushers in a new era of miRNA research that focuses on networks more than on individual connections between miRNA and strongly predicted targets.

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Biases in Illumina transcriptome sequencing caused by random hexamer priming
Kasper D. Hansen1,*, Steven E. Brenner2 and Sandrine Dudoit1,3

1Division of Biostatistics, School of Public Health, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720-7358, 2Department of Plant and Microbial Biology, UC Berkeley, 461 Koshland Hall, Berkeley, CA 94720-3102 and 3Department of Statistics, UC Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA

*To whom correspondence should be addressed. Tel: ; Fax: ; Email:

Received December 1, 2009. Revised March 16, 2010. Accepted March 17, 2010.

Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.

Biases in Illumina transcriptome sequencing caused by random hexamer priming

Biases in Illumina transcriptome sequencing caused by random hexamer priming
Kasper D. Hansen1,*, Steven E. Brenner2 and Sandrine Dudoit1,3

1Division of Biostatistics, School of Public Health, UC Berkeley, 101 Haviland Hall, Berkeley, CA 94720-7358, 2Department of Plant and Microbial Biology, UC Berkeley, 461 Koshland Hall, Berkeley, CA 94720-3102 and 3Department of Statistics, UC Berkeley, 367 Evans Hall, Berkeley, CA 94720-3860, USA

*To whom correspondence should be addressed. Tel: ; Fax: ; Email:

Received December 1, 2009. Revised March 16, 2010. Accepted March 17, 2010.

Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.

Small RNA class transition from siRNA/piRNA to miRNA during pre-implantation mouse development

Small RNA class transition from siRNA/piRNA to miRNA during pre-implantation mouse development
Yusuke Ohnishi1,2, Yasushi Totoki3, Atsushi Toyoda4, Toshiaki Watanabe5,6, Yasuhiro Yamamoto5,6, Katsushi Tokunaga2, Yoshiyuki Sakaki4, Hiroyuki Sasaki5,6 and Hirohiko Hohjoh1,*

1Department of Molecular Genetics, National Institute of Neuroscience, NCNP, Tokyo 187-8502, 2Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, 3Cancer Genomics Project, National Cancer Center Research Institute, Tokyo 104-0045, 4Sequence Technology Team, RIKEN Genomic Sciences Center, Yokohama 230-0045, 5Division of Human Genetics, Department of Integrated Genetics, National Institute of Genetics, Research Organization of Information and Systems and 6Department of Genetics, School of Life Science, The Graduate University for Advanced Studies, Mishima 411-8540, Japan

*To whom correspondence should be addressed. Tel: , ext. 5951; Fax: +81 42 346 1755; Email:

Received February 9, 2010. Revised March 17, 2010. Accepted March 18, 2010.

Recent studies showed that small interfering RNAs (siRNAs) and Piwi-interacting RNA (piRNA) in mammalian germ cells play important roles in retrotransposon silencing and gametogenesis. However, subsequent contribution of those small RNAs to early mammalian development remains poorly understood. We investigated the expression profiles of small RNAs in mouse metaphase II oocytes, 8–16-cell stage embryos, blastocysts and the pluripotent inner cell mass (ICM) using high-throughput pyrosequencing. Here, we show that during pre-implantation development a major small RNA class changes from retrotransposon-derived small RNAs containing siRNAs and piRNAs to zygotically synthesized microRNAs (miRNAs). Some siRNAs and piRNAs are transiently upregulated and directed against specific retrotransposon classes. We also identified miRNAs expression profiles characteristic of the ICM and trophectoderm (TE) cells. Taken together, our current study reveals a major reprogramming of functional small RNAs during early mouse development from oocyte to blastocyst.

Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome

Unconstrained mining of transcript data reveals increased alternative splicing complexity in the human transcriptome
I. G. Mollet1,*, Claudia Ben-Dov2, Daniel Felício-Silva1, A. R. Grosso1, Pedro Eleutério1, Ruben Alves1, Ray Staller3, Tito Santos Silva4 and Maria Carmo-Fonseca1

1Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal, 2CRG-Centre de Regulació Genómica, Barcelona, Spain, 3Independent Senior Consultant on Statistics & Mathematics, Amsterdam, the Netherlands and 4Faculdade de Engenharia, Universidade Católica Portuguesa, Lisbon, Portugal

*To whom correspondence should be addressed. Tel: ; Fax: +46(0)40-391222; Email:;

Received November 17, 2009. Revised February 22, 2010. Accepted March 9, 2010.

Mining massive amounts of transcript data for alternative splicing information is paramount to help understand how the maturation of RNA regulates gene expression. We developed an algorithm to cluster transcript data to annotated genes to detect unannotated splice variants. A higher number of alternatively spliced genes and isoforms were found compared to other alternative splicing databases. Comparison of human and mouse data revealed a marked increase, in human, of splice variants incorporating novel exons and retained introns. Previously unannotated exons were validated by tiling array expression data and shown to correspond preferentially to novel first exons. Retained introns were validated by tiling array and deep sequencing data. The majority of retained introns were shorter than 500 nt and had weak polypyrimidine tracts. A subset of retained introns matching small RNAs and displaying a high GC content suggests a possible coordination between splicing regulation and production of noncoding RNAs. Conservation of unannotated exons and retained introns was higher in horse, dog and cow than in rodents, and 64% of exon sequences were only found in primates. This analysis highlights previously bypassed alternative splice variants, which may be crucial to deciphering more complex pathways of gene regulation in human.

Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing

Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing
Ernesto Picardi1, David S. Horner2, Matteo Chiara2, Riccardo Schiavon3, Giorgio Valle3 and Graziano Pesole1,4,*

1Dipartimento di Biochimica e Biologia Molecolare ‘E. Quagliariello’, Università degli Studi di Bari, 70126 Bari, 2Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, 20133 Milano, 3CRIBI, Università degli Studi di Padova, viale G. Colombo 3, 35121 Padova and 4Istituto Tecnologie Biomediche del Consiglio Nazionale delle Ricerche, via Amendola 122/D, 70125 Bari, Italy

*To whom correspondence should be addressed. Tel: ; Fax: +39 080 544 3317; Email:

Received September 7, 2009. Revised March 9, 2010. Accepted March 9, 2010.

RNA editing is a widespread post-transcriptional molecular phenomenon that can increase proteomic diversity, by modifying the sequence of completely or partially non-functional primary transcripts, through a variety of mechanistically and evolutionarily unrelated pathways. Editing by base substitution has been investigated in both animals and plants. However, conventional strategies based on directed Sanger sequencing are time-consuming and effectively preclude genome wide identification of RNA editing and assessment of partial and tissue-specific editing sites. In contrast, the high-throughput RNA-Seq approach allows the generation of a comprehensive landscape of RNA editing at the genome level. Short reads from Solexa/Illumina GA and ABI SOLiD platforms have been used to investigate the editing pattern in mitochondria of Vitis vinifera providing significant support for 401 C-to-U conversions in coding regions and an additional 44 modifications in non-coding RNAs. Moreover, 76% of all C-to-U conversions in coding genes represent partial RNA editing events and 28% of them were shown to be significantly tissue specific. Solexa/Illumina and SOLiD platforms showed different characteristics with respect to the specific issue of large-scale editing analysis, and the combined approach presented here reduces the false positive rate of discovery of editing events.

Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing

Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing
Bin Wang1, Guangwu Guo2,3,4, Chao Wang1, Ying Lin1, Xiaoning Wang1, Mouming Zhao5, Yong Guo1, Minghui He1,4, Yong Zhang4,* and Li Pan1,*

1School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong 510006, 2Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, 3Graduate University of Chinese Academy Sciences, Beijing 100049, 4Beijing Genomics Institute at Shenzhen, Shenzhen 518000 and 5College of Light Industry and Food Sciences, South China University of Technology, Guangzhou, Guangdong 510641, China

*To whom correspondence should be addressed. Tel: ; Fax: +86 20 39380698; Email:

Correspondence may also be addressed to Yong Zhang. Tel/Fax: +86 755 25274032; Email:

Received November 21, 2009. Revised March 16, 2010. Accepted March 26, 2010.

Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.

First Time, Every Time: Nucleosomes at a Promoter Can Determine the Probability of Gene Activation

First Time, Every Time: Nucleosomes at a Promoter Can Determine the Probability of Gene Activation

Warren P. Voth1 and David J. Stillman1, Corresponding Author Contact Information, E-mail The Corresponding Author

1 Department of Pathology, University of Utah Health Sciences Center, Salt Lake City, UT 84112, USA

Available online 19 April 2010.

Transcription factor binding sites are found in either nucleosome-free or nucleosome-embedded locations, thus in vivo relationships between nucleosome position and gene activation are not fully understood. In this issue of Developmental Cell, Bai et al. show that binding sites located in nucleosome depleted regions guarantee high reliability, not amplitude, of promoter firing.
Article Outline

Main Text

Main Text
A nucleosome can block transcription factor binding, and global surveys show most promoters have upstream nucleosome depleted regions (NDRs), while others have nucleosomes that may block transcription factor binding. What are the effects of inhibitory nucleosomes on gene activation? In this issue of Developmental Cell, Bai et al. (2010) examine this question at a cell-cycle regulated promoter and show that a promoter NDR can guarantee that a gene is activated every cycle. Alternatively, nucleosome-occupied binding sites can create a bimodal pattern of activation that may require stricter conditions for activation.

The S. cerevisiae CLN2 gene encodes a G1 cyclin, and cyclin expression drives the G1/S transition (Skotheim et al., 2008). Bai et al. (2010) mapped positions of nucleosomes at the CLN2 promoter and found a nucleosome covering the TATA element and the transcription start site (Figure 1A). The NDR between nucleosomes −2 and −3 contains three binding sites for SBF, a G1-specific transcription factor. SBF is inactive in early G1, but phosphorylation of the Whi5 inhibitor by CDK1 ends this inhibition, allowing SBF to activate CLN2 ([Costanzo et al., 2004] and [de Bruin et al., 2004]). Studies with synchronized cells show that nucleosomes −1 and −2 are evicted transiently during the cell-cycle and that nucleosome eviction requires SBF and the FACT histone chaperone ([Bai et al., 2010] and [Takahata et al., 2009a]).

Friday, April 16, 2010

Alternative splicing and evolution: diversification, exon definition and function


Nature Reviews Genetics 11, 345-355 (May 2010) | doi:10.1038/nrg2776

Alternative splicing and evolution: diversification, exon definition and function

Hadas Keren1, Galit Lev-Maor1 & Gil Ast1 About the authors
Top of page

Over the past decade, it has been shown that alternative splicing (AS) is a major mechanism for the enhancement of transcriptome and proteome diversity, particularly in mammals. Splicing can be found in species from bacteria to humans, but its prevalence and characteristics vary considerably. Evolutionary studies are helping to address questions that are fundamental to understanding this important process: how and when did AS evolve? Which AS events are functional? What are the evolutionary forces that shaped, and continue to shape, AS? And what determines whether an exon is spliced in a constitutive or alternative manner? In this Review, we summarize the current knowledge of AS and evolution and provide insights into some of these unresolved questions.

Friday, April 9, 2010

Molecular basis of S100 proteins interacting with the p53 homologs p63 and p73

Oncogene (2010) 29, 2024–2035; doi:10.1038/onc.2009.490; published online 8 February 2010
Molecular basis of S100 proteins interacting with the p53 homologs p63 and p73

J van Dieck1, T Brandt1, D P Teufel1, D B Veprintsev1, A C Joerger1 and A R Fersht1

1MRC Centre for Protein Engineering, Hills Road, Cambridge, UK

Correspondence: Professor AR Fersht, MRC Centre for Protein Engineering, Cambridge University, Hills Road, Cambridge, Cambs CB2 0QH, UK. E-mail:

Received 6 August 2009; Revised 16 October 2009; Accepted 27 October 2009; Published online 8 February 2010.
Top of page

S100 proteins modulate p53 activity by interacting with its tetramerization (p53TET, residues 325–355) and transactivation (residues 1–57) domains. In this study, we characterized biophysically the binding of S100A1, S100A2, S100A4, S100A6 and S100B to homologous domains of p63 and p73 in vitro by fluorescence anisotropy, analytical ultracentrifugation and analytical gel filtration. We found that S100A1, S100A2, S100A4, S100A6 and S100B proteins bound different p63 and p73 tetramerization domain variants and naturally occurring isoforms with varying affinities in a calcium-dependent manner. Additional interactions were observed with peptides derived from the p63 and p73 N-terminal transactivation domains. Importantly, S100 proteins bound p63 and p73 with different affinities in their different oligomeric states, similarly to the differential modes of binding to p53. On the basis of our data, we hypothesize that S100 proteins regulate the oligomerization state of all three p53 family members and their isoforms, with a potential physiological relevance in developmental and disease-related processes. The regulation of the p53 family by S100 is complicated and depends on the target preference of each individual S100 protein, the concentration of the proteins and calcium, as well as the splicing variation of p63 or p73. Our results outlining the complexity of the interaction should be considered when studying the functional effects of S100 proteins in their biological context.

S100; p63; p73; tumor suppressor; protein–protein interaction

De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis

De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
Valentina Boeva1,2,3,4, Didier Surdez1,2, Noëlle Guillon1,2, Franck Tirode1,2, Anthony P. Fejes5, Olivier Delattre1,2 and Emmanuel Barillot1,3,4,*

1Institut Curie, 26 rue d’Ulm, 2INSERM, U830, Genetics and Biology of Cancer, 3INSERM, U900, Bioinformatics, Biostatistics, Epidemiology and Computational Systems Biology of Cancer, Paris, F-75248, 4Mines ParisTech, Fontainebleau, F-77300, France and 5Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, V5Z 4S6, Canada

*To whom correspondence should be addressed. Tel: ; Fax: +33 1 56 24 69 11; Email:

Received November 10, 2009. Revised February 23, 2010. Accepted March 15, 2010.

Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to ~150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression—positively as often as negatively—and at much larger distances (up to ~1 Mb).

Detection of splice junctions from paired-end RNA-seq data by SpliceMap

Detection of splice junctions from paired-end RNA-seq data by SpliceMap
Kin Fai Au1, Hui Jiang1,2, Lan Lin3, Yi Xing3 and Wing Hung Wong1,*

1Department of Statistics, Stanford University, Stanford, CA 94305, 2Stanford Genome Technology Center, 855 California Ave, Palo Alto, CA 94304 and 3Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA, 52242, USA

*To whom correspondence should be addressed. Tel: ; Fax: +1 650 725 8977; Email:

Received December 7, 2009. Revised March 10, 2010. Accepted March 12, 2010.

Alternative splicing is a prevalent post-transcriptional process, which is not only important to normal cellular function but is also involved in human diseases. The newly developed second generation sequencing technique provides high-throughput data (RNA-seq data) to study alternative splicing events in different types of cells. Here, we present a computational method, SpliceMap, to detect splice junctions from RNA-seq data. This method does not depend on any existing annotation of gene structures and is capable of finding novel splice junctions with high sensitivity and specificity. It can handle long reads (50–100 nt) and can exploit paired-read information to improve mapping accuracy. Several parameters are included in the output to indicate the reliability of the predicted junction and help filter out false predictions. We applied SpliceMap to analyze 23 million paired 50-nt reads from human brain tissue. The results show at this depth of sequencing, RNA-seq can support reliable detection of splice junctions except for those that are present at very low level. Compared to current methods, SpliceMap can achieve 12% higher sensitivity without sacrificing specificity.

A Signal-Noise Model for Significance Analysis of ChIP-seq with Negative Control

A Signal-Noise Model for Significance Analysis of ChIP-seq with Negative Control
Han Xu 1,3, Lusy Handoko 2, Xueliang Wei 4, Chaopeng Ye 2, Jianpeng Sheng 5, Chia-Lin Wei 2, Feng Lin 3,* and Wing-Kin Sung 1,4,*

1Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore; 2Genome Technology & Biology Group, Genome Institute of Singapore, 138672, Singapore; 3School of Computer Engineering, Nanyang Technological University, 637553, Singapore; 4School of Computing, National University of Singapore, 117543, Singapore; 5School of Biological Science, Nanyang Techno-logical University, 637551, Singapore

*To whom correspondence should be addressed. Feng Lin, Wing-Kin Sung, E-mail:,


Motivation: ChIP-seq is becoming the main approach to the genome-wide study of protein-DNA interactions and histone modifications. Existing informatics tools perform well to extract strong ChIP-enriched sites. However, two questions remain to be answered: (a) to which extent is a ChIP-seq experiment able to reveal the weak ChIP-enriched sites? (b) are the weak sites biologically meaningful? To answer these questions, it is necessary to identify the weak ChIP signals from background noise.

Results: We propose a linear signal-noise model, in which a noise rate was introduced to represent the fraction of noise in a ChIP library. We developed an iterative algorithm to estimate the noise rate using a control library, and derived a library-swapping strategy for the FDR estimation. These approaches were integrated in a general-purpose framework, named CCAT (Control based ChIP-seq Analysis Tool), for the significance analysis of ChIP-seq. Applications to H3K4me3 and H3K36me3 datasets showed CCAT predicted significantly more ChIP-enriched sites than previous methods did. With the high sensitivity of CCAT prediction, we revealed distinct chromatin features associated to the strong and weak H3K4me3 sites.


Thursday, April 8, 2010

Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate

Published online before print April 7, 2010 as doi: 10.1096/fj.10-154484.

Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate gene, RORA, whose protein product is reduced in autistic brain
AnhThu Nguyen, Tibor A. Rauch, Gerd P. Pfeifer, and Valerie W. Hu

E-mail contact:

Autism is currently considered a multigene disorder with epigenetic influences. To investigate the contribution of DNA methylation to autism spectrum disorders, we have recently completed large-scale methylation profiling by CpG island microarray analysis of lymphoblastoid cell lines derived from monozygotic twins discordant for diagnosis of autism and their nonautistic siblings. Methylation profiling revealed many candidate genes differentially methylated between discordant MZ twins as well as between both twins and nonautistic siblings. Bioinformatics analysis of the differentially methylated genes demonstrated enrichment for high-level functions including gene transcription, nervous system development, cell death/survival, and other biological processes implicated in autism. The methylation status of 2 of these candidate genes, BCL-2 and retinoic acid-related orphan receptor alpha (RORA), was further confirmed by bisulfite sequencing and methylation-specific PCR, respectively. Immunohistochemical analyses of tissue arrays containing slices of the cerebellum and frontal cortex of autistic and age- and sex-matched control subjects revealed decreased expression of RORA and BCL-2 proteins in the autistic brain. Our data thus confirm the role of epigenetic regulation of gene expression via differential DNA methylation in idiopathic autism, and furthermore link molecular changes in a peripheral cell model with brain pathobiology in autism.—Nguyen, A., Rauch, T. A., Pfeifer, G. P., Hu, V. W. Global methylation profiling of lymphoblastoid cell lines reveals epigenetic contributions to autism spectrum disorders and a novel autism candidate gene, RORA, whose protein product is reduced in autistic brain.

Thursday, April 1, 2010

Epigenetic marks identify functional elements

Epigenetic marks identify functional elements

Randall H Morse

Journal name:
Nature Genetics
Year published:

Enhancers and transcription factor binding sites that control cell-specific transcription in higher eukaryotes can be found up to hundreds of kilobases from the promoters that they control, making their identification challenging. A new study uses a model based on histone modifications and chromatin dynamics to predict functional elements involved in androgen receptor response.

Saturday, March 27, 2010

Insulin-like signal needed to keep stem cells alive in adult brain

Insulin-like signal needed to keep stem cells alive in adult brain

By Robert Sanders, Media Relations | 25 March 2010


BERKELEY — University of California, Berkeley, biologists have found a signal that keeps stem cells alive in the adult brain, providing a focus for scientists looking for ways to re-grow or re-seed stem cells in the brain to allow injured areas to repair themselves.

fruit fly nerve cellsMushroom bodies (red), which are the center of learning and memory in the brain, from two adult fruit flies. Normally, new neurons do not appear in the adult mushroom body. UC Berkeley biologists altered neural stem cells to allow them to persist for at least a month in the adult brain, and to give rise to newborn nerve cells (green) that send out axons to other areas of the mushroom body, just like normal neurons. (Sarah Siegrist/UC Berkeley)
The researchers discovered in fruit flies that keeping the insulin receptor revved up in the brain prevents the die-off of neural stem cells that occurs when most regions of the brain mature into their adult forms. Whether the same technique will work in humans is unknown, but the UC Berkeley team hopes to find out.

"This work doesn't point the way to taking an adult who has already lost stem cells and bringing them back mysteriously, but it suggests what mechanisms might be operating to get rid of them in the first place," said Iswar K. Hariharan, UC Berkeley professor of molecular and cell biology. "Plus, if you were able to introduce neural stem cells into an adult brain, this suggests what kinds of mechanisms you might need to have in place to keep them alive."

Hariharan noted that other researchers have gotten neural stem cells to persist by blocking genes that cause them to die. Yet this alone does not produce healthy, normal-looking neural stem cells that can make mature neurons. The UC Berkeley team's new finding shows that it also is necessary to provide an insulin-like signal. If stopping neural stem cell death is analogous to taking your foot off the brake, then providing an insulin-like signal is like stepping on the gas, he said. Both are essential.

Hariharan, post-doctoral researcher Sarah E. Siegrist and their colleagues published their findings today (Thursday, March 25) in the online version of the journal Current Biology. Their report will appear in the journal's April 13 print edition.

Most areas of the adult mammalian brain and fruit fly brain are devoid of neural stem cells, the only cells able to generate full-fledged neurons. Presumably, Hariharan said, the lack of neural stem cells is why the injured brain is unable to replace dead neurons.

In the new study, Siegrist showed that the stem cells present in the pupal stage of fruit flies are gone in the adult brain because they die off, rather than merely mature into neurons. The stem cells that persisted the longest were in the mushroom body, a region of the fly brain responsible for memory and learning that, in some ways, is like the hippocampus in humans.

In subsequent experiments, she attempted to prevent the death of neural stem cells in fruit flies by genetically blocking a process called programmed cell death (apoptosis). While this allowed the stem cells to survive longer, the cells were small and did not make many neurons. In fact, Siegrist said, they showed signs of impaired growth, suggestive of insulin withdrawal.

She then tried various genetic manipulations to mimic an insulin-type signal, this time using mutant fruit flies with their apoptosis genes also blocked. Amazingly, the neural stem cells persisted for at least a month and even generated many mature, apparently normal, nerve cells.

"These neural stem cells seem to behave properly, they express the proteins that you expect neural stem cells to express, they look like their normal counterparts, and most importantly, they spin off cells which become normal mature nerve cells that put out processes (axons) that, in some cases, seem to go where normal processes go," Siegrist said. "We don't know whether these cells function normally or whether they are electrically active. At least it is encouraging that we can get nerve cells made in a part of the (fruit fly) brain that normally cannot make nerve cells in the adult brain."

"Sarah had to do two manipulations together to keep these neural stem cells alive, and neither worked alone," Hariharan said. "One was to keep the insulin signal on, and one was to block programmed cell death. Each improved things a little bit, but when you did the two together, the neural stem cells survived for a month, at which time they were throwing off mature neurons or normal looking neurons that sent out processes."

Siegrist plans to continue her search through mutant fruit flies to find other genes that improve survival in the mushroom body and allow stem cells in other areas of the fly brain to persist. She also plans collaborations to explore similar mechanisms in mammals, to see if analogous manipulations could keep neural stem cells alive in the mammalian brain.

"In fruit flies, pathways downstream of the insulin receptor are important in keeping these neural stem cells alive," Siegrist said. "Mammals have the same genes downstream of their insulin receptors, so we may find the same response to insulin or insulin-like growth factors in mammals."

Other coauthors are former UC Berkeley undergraduate Najm Haque, now a technician in Hariharan's lab; Chun-Hong Chen of the National Health Research Institutes in Zhunan Town, Taiwan; and Bruce A. Hay of the California Institute of Technology (Caltech) in Pasadena, Calif.

The research is funded by the National Institutes of Health and the Damon Runyon Cancer Research Foundation.

Sunday, March 21, 2010

What counts is how genes are regulated

What counts is how genes are regulated

Once the human genome was sequenced in 2001, the hunt was on for the genes that make each of us unique. But scientists at the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany, and Yale and Stanford Universities in the USA, have found that we differ from each other mainly because of differences not in our genes, but in how they're regulated – turned on or off, for instance. In a study published today in Science, they are the first to compare entire human genomes and determine which changes in the stretches of DNA that lie between genes make gene regulation vary from one person to the next. Their findings hail a new way of thinking about ourselves and our diseases.

The technological advances of the past decade have been so great that scientists can now obtain the genetic sequences – or genomes – of several people in a fraction of the time and for a fraction of the cost it took to determine that first human genome. Moreover, these advances now enable researchers to understand how genes are regulated in humans.

A group of scientists led by Jan Korbel at EMBL and Michael Snyder initially at Yale and now in Stanford were the first to compare individually sequenced human genomes to look for what caused differences in gene regulation amongst ten different people. They focused on non-coding regions – stretches of DNA that lie between genes and, unlike genes, don't hold the instructions for producing proteins. These DNA sequences, which may vary from person to person, can act as anchors to which regulatory proteins, known as transcription factors, attach themselves to switch genes on or off.

Korbel, Snyder, and colleagues found that up to a quarter of all human genes are regulated differently in different people, more than there are genetic variations in genes themselves. The scientists found that many of these differences in how regulatory proteins act are due to changes in the DNA sequences they bind to. In some cases, such changes can be a difference in a single letter of the genetic code, while in others a large section of DNA may be altered. But surprisingly, they discovered even more variations could not be so easily explained. They reasoned that some of these seemingly inexplicable differences might arise if regulatory proteins didn't act alone, but interacted with each other.

"We developed a new approach which enabled us to identify cases where a protein's ability to turn a gene on or off can be affected by interactions with another protein anchored to a nearby area of the genome," Korbel explains. "With it, we can begin to understand where such interactions happen, without having to study every single regulatory protein out there."

The scientists found that even if different people have identical copies of a gene – for instance ORMDL3, a gene known to be involved in asthma in children – the way their cells regulate that gene can vary from person to person.

"Our findings may help change the way we think of ourselves, and of diseases", Snyder concludes: "as well as looking for disease genes, we could start looking at how genes are regulated, and how individual variations in gene regulation could affect patients' reactions."

Finally, Korbel, Snyder and colleagues compared the information on humans with that from a chimpanzee, and found that with respect to gene regulation there seems to be almost as much variation between humans as between us and our primate cousins – a small margin in which may lie important clues both to how we evolved and to what makes us humans different from one another.

In a study published online in Nature yesterday, researchers led by Snyder in the USA and Lars Steinmetz at EMBL in Heidelberg have found that similar differences in gene regulation also occur in an organism which is much farther from us in the evolutionary tree: baker's yeast.

Friday, March 19, 2010

Transcription Factors May Dictate Differences Between Individuals

March 18, 2010
Transcription Factors May Dictate Differences Between Individuals

Researchers are only beginning to understand how individual variation in gene regulation can have a lasting impact on one’s health and susceptibility to certain diseases. Now, an ambitious survey of the human genome has identified differences in the binding of master regulators called transcription factors to DNA that affect how genes are expressed in different people.

The study, which is published in the March 18, 2010, issue of Science, looked at two common transcription factors. HHMI medical research fellow Maya Kasowski and her colleagues in the laboratory of molecular biologist Michael Snyder at Yale University conducted the work with Jan Korbel at the European Molecular Biology Laboratory. Snyder has since joined the faculty at Stanford University.

“We know there are differences in gene expression between people. Understanding the differences in how genes are regulated could help us understand human diversity. ”
Maya Kasowski

Transcription factors account for as much as 10 percent of the coding genome in humans and other organisms. When activated, transcription factors switch on or off hundreds or thousands of genes, a cascade that programs cells to grow or divide. “The activity of transcription factors determines what a cell is doing at any given moment,” says Kasowski, who was a medical student at Yale when she received her HHMI medical research fellowship. She has since decided to pursue an M.D./Ph.D. degree.

Despite their large numbers and critical role, many aspects of transcription factor biology remain poorly understood. Until now, no one had looked at whether there was any variability in the targets of transcription factors from one person to the next. The current study found a “number of differences between individuals” in the binding sites of two transcription factors, Snyder says.

Transcription factors bind to the human genome within areas of the genome still viewed as a black box—vast stretches of DNA sequence between known genes. Gradually, biologists have found that much of this DNA performs a vital function—helping turn genes on and off in specific situations. Some of the regulatory regions, known as binding regions, serve as handholds for transcription factors.

“We know there are differences in gene expression between people.” Kasowski says. “Understanding the differences in how genes are regulated could help us understand human diversity. But identifying the regulatory DNA that controls expression is much more difficult than looking for differences in the regions of the genome that code for genes.”

For the current study, Kasowski, Snyder, and their colleagues examined two important transcription factors: RNA polymerase II and NFkappaB. RNA polymerase II, which is active in all cells, transcribes DNA into RNA. NFkappaB is activated by stress, plays a key role in immune responses to infections, and has been implicated in several diseases, including cancer.

The team mapped every binding region for these two factors inside the genomes of 10 individuals. To do so, they deployed a new technology that uses chemicals to freeze transcription factors as they bind to the genome. The scientists then sequenced the segment of DNA to which the transcription factor bound. After the team combined the data from all 10 individuals, they found around 19,000 binding regions for RNA polymerase II and another 15,500 binding regions for NFkappaB.

They discovered that the number of transcription factors binding at the different sites often varied near different genes, which in many cases influenced how much of the gene was expressed. Hence, variation in transcription factor binding can help explain why one person may make more of a certain gene product than another, Snyder says. Among any two individuals, the team found that 25 percent of the RNA polymerase II binding regions varied in time or frequency, as did 7.5 percent of the NFkappaB binding regions.

Closer examination of these variable binding regions showed that single-letter differences in the genome—called SNPs—accounted for some of the difference in transcription factor binding. That is, in some of individuals, a single letter change at a certain binding region influenced the likelihood that the transcription factor would bind there. “We found that differences in DNA sequence contributed to differences in transcription factor binding,” Kasowski says. “The more SNPs we found in a particular binding region, the more variation in binding we saw.”

Other, larger differences in the genome, called structural variation, also accounted for a number of the differences in transcription factor binding. Structural variation happens when large segments of the genome are deleted, duplicated, or inverted. It varies widely among humans, and the role of such variability in human biology is not well understood.

But the new study shows that SNPs and structural variation can either increase or decrease transcription factor binding, and, hence, the amount of a protein that gets made from a particular gene. “We found that about one third of the differences in binding was caused by SNPs and structural variation,” Snyder says. “This is the first time anyone has shown that SNPs and structural variation affect large number of regulatory elements that control gene expression. Normally, people look at differences in the gene themselves rather than in the regulatory regions, because they are difficult to identify.”

The study also reports differences in binding of RNA polymerase II and NFkappaB near genes implicated in many major diseases, including type 1 diabetes, lupus, chronic lymphatic leukemia, schizophrenia, asthma, Crohn’s disease, and rheumatoid arthritis. “Variation in the regulation of genes might eventually help account for some of the varying susceptibility to diseases we see in the population,” Kasowski says.

In addition to looking at humans, Kasowski, Snyder and their colleagues looked at transcription factor binding for a single chimpanzee. The study shows that 32 percent of RNA polymerase II binding regions differed between the humans in the study and the chimp. Snyder says that he included the chimp out of curiosity to see how transcription factor binding might account for differences between ourselves and our closest genetic cousin. But the 32 percent difference between chimps and humans was not that much larger than the 25 percent difference in RNA polymerase II binding found among two individuals.

Still, Snyder says that the study opens a new genomic frontier for biologists. “Only about two percent of our DNA codes for genes,” he says. “Studying the rest of the genome, including gene regulation and transcription factors, is the next wave in understanding human variation.”

Tuesday, March 16, 2010

SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

Juan Falgueras1 email, Antonio J Lara2 email, Noé Fernández-Pozo3 email, Francisco R Cantón3 email, Guillermo Pérez-Trabado2,4 email and M Gonzalo Claros2,3 email

1 Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain

2 Plataforma Andaluza de Bioinformática, Universidad de Málaga, 29071 Málaga, Spain

3 Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain

4 Departamento de Arquitectura de Computadores, Universidad de Málaga, Málaga, Spain

author email corresponding author email

BMC Bioinformatics 2010, 11:38doi:10.1186/1471-2105-11-38
Published: 20 January 2010

High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms.

SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming.

SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.

Saturday, March 13, 2010

Small RNAs: An epigenetic silencing influence

Small RNAs: An epigenetic silencing influence
SOURCE: Nature Reviews Genetics
home | subscribe

New evidence has emerged that in moss, miRNAs can silence gene expression at the transcriptional level by interacting with DNA, leading to methylation.

MicroRNAs (miRNAs) regulate gene expression by base pairing with target RNAs, leading to their cleavage in plants or translational inhibition in animals. Now evidence has emerged that in moss, miRNAs can also silence gene expression at the transcriptional level by interacting with DNA, leading to methylation. This discovery broadens the regulatory influence of miRNAs, and the mechanism may also be applicable to other organisms.


Khraiwesh et al. examined the role of Dicer proteins in the moss Physcomitrella patens, which has four Dicers (DICER-LIKE 1a (DCL1a), DCL1b, DCL3 and DCL4). They chose P. patens because the dual functions of Dicers in miRNA biogenesis and target cleavage are separable in this species. Using targeted knockout-mutants of P. patens, the authors showed that DCL1a is required for miRNA biogenesis and DCL1b is required for miRNA-induced target RNA cleavage.

Intriguingly, DCL1b-null mutants had reduced levels of miRNA target transcripts despite the absence of miRNA-guided cleavage. How do miRNAs regulate their targets in DCL1b knockouts? The authors found that the genes that encode miRNA targets were methylated in DCL1b knockouts, but were not methylated in wild-type controls. They then showed that there was a reduced rate of transcription of miRNA target genes compared with unmethylated non-miRNA-target genes in DCL1b knockouts, indicating that the observed methylation leads to transcriptional silencing of genes that encode miRNA targets.

The authors suggested that in DCL1b-null mutants, miRNAs form stable duplexes with mRNAs within an RNA-induced transcriptional silencing complex. Consistent with this model, the miRNA targets primed cDNA synthesis without the addition of exogenous primers, supporting the existence of miRNA–mRNA duplexes. Do the levels of miRNAs, and therefore the levels of these duplexes, influence DNA methylation? Khraiwesh et al. created transgenic P. patens lines that expressed different levels of an artificial miRNA (amiRNA) and found that with increased expression of the amiRNA, there was increased silencing of the amiRNA target. Moreover, methylation and silencing were not restricted to DCL1b knockouts — they also occurred when high levels of amiRNA were expressed in wild-type P. patens lines.

As transcriptional silencing was also observed in non-transgenic P. patens, the authors investigated whether this pathway could also operate in wild-type P. patens under physiological conditions. Treatment of P. patens with the plant hormone abscisic acid (ABA) led to reduced levels of an miRNA target transcript (bHLH) that encodes a basic helix–loop–helix transcription factor, which in turn led to increased levels of the cognate miRNA (miR1026) and methylation of the bHLH gene. As ABA is a mediator of stress signalling, these results suggest that miRNAs might epigenetically regulate stress-responsive genes.

The physiological regulation of this epigenetic miRNA-induced silencing pathway and the conservation of miRNA pathway components among species suggest that this mechanism might be generally applicable — a topic for future investigation.

Meera Swami

Thursday, March 11, 2010

Discovery of cellular 'switch' may provide new means of triggering cell death, treating disease

Discovery of cellular 'switch' may provide new means of triggering cell death, treating disease

A research team led by the University of Colorado at Boulder has discovered a previously unknown cellular "switch" that may provide researchers with a new means of triggering programmed cell death, findings with implications for treating cancer.

TA-65 - The revolutionary molecule that makes cells young again -

The new results are a big step forward in understanding programmed cell death, or apoptosis, a cell suicide process that involves a series of biochemical events leading to changes like cell body shrinkage, mitochondria destruction and chromosome fragmentation, said CU-Boulder Professor Ding Xue. But unlike traumatic cell death from injury, programmed cell death is a naturally occurring aspect of animal development that may help prevent human diseases like cancer and autoimmune disorders, said Xue, lead author on the new study.

In the new study, Xue and his team found that a well-known cellular molecule called caspase - known as the "executioner enzyme" for apoptosis because of its primary role of cutting up and destroying cellular proteins -- has an entirely different effect on a particular enzyme called Dicer. The team found that when caspase cleaves Dicer, it does not kill it but instead changes its function, causing Dicer to break up chromosomes -- pieces of coiled DNA containing thousands of genes -- and kill the cells that house them.

"This finding was totally unexpected," said Xue of CU-Boulder's molecular, cellular and developmental biology department. "We believe that by understanding this mechanism, we may be able to develop a new way to trigger cell death in a controlled manner as a way to treat disease."

A paper on the subject appears in the March 12 issue of Science. Co-authors on the study included CU-Boulder postdoctoral researchers Akihisa Nakagawa and Yong Shi and Tokyo Women's Medical University researchers Eriko Kage-Nakadai and Shohei Mitani.

The normal function of Dicer is to snip strands of RNA into smaller pieces that attach to messenger RNA molecules -- which carry DNA's genetic messages from the nucleus of cells to make specific proteins in cell cytoplasm -- and silence their activity, said Xue. But when caspase comes in contact with Dicer, it takes away Dicer's ability to cleave RNA and it replaces it with the ability to snip up and destroy DNA-laden chromosomes.

Wednesday, March 10, 2010

The primary transcriptome of the major human pathogen Helicobacter pylori

Nature 464, 250-255 (11 March 2010) | doi:10.1038/nature08756; Received 6 August 2009; Accepted 14 December 2009; Published online 17 February 2010

The primary transcriptome of the major human pathogen Helicobacter pylori

Cynthia M. Sharma1, Steve Hoffmann2, Fabien Darfeuille3,4, Jérémy Reignier3,4, Sven Findeiß2, Alexandra Sittka1, Sandrine Chabas3,4, Kristin Reiche5, Jörg Hackermüller5, Richard Reinhardt6, Peter F. Stadler2,5,7,8,9 & Jörg Vogel1,10

1. Max Planck Institute for Infection Biology, RNA Biology Group, D-10117 Berlin, Germany
2. University of Leipzig, Department of Computer Science & Interdisciplinary Centre for Bioinformatics, D-04107 Leipzig, Germany
3. INSERM U869 and,
4. Université de Bordeaux, F-33076 Bordeaux Cedex, France
5. Fraunhofer Institute for Cell Therapy and Immunology, RNomics Group, D-04103 Leipzig, Germany
6. Max Planck Institute for Molecular Genetics, D-14195 Berlin, Germany
7. Max Planck Institute for the Mathematics in Sciences, D-04103 Leipzig, Germany
8. University of Vienna, Institute for Theoretical Chemistry, A-1090 Vienna, Austria
9. The Santa Fe Institute, Santa Fe, 87501 New Mexico, USA
10. University of Würzburg, Institute for Molecular Infection Biology, D-97080 Würzburg, Germany

Correspondence to: Jörg Vogel1,10 Correspondence and requests for materials should be addressed to J.V. (Email:

Top of page

Genome sequencing of Helicobacter pylori has revealed the potential proteins and genetic diversity of this prevalent human pathogen, yet little is known about its transcriptional organization and noncoding RNA output. Massively parallel cDNA sequencing (RNA-seq) has been revolutionizing global transcriptomic analysis. Here, using a novel differential approach (dRNA-seq) selective for the 5′ end of primary transcripts, we present a genome-wide map of H. pylori transcriptional start sites and operons. We discovered hundreds of transcriptional start sites within operons, and opposite to annotated genes, indicating that complexity of gene expression from the small H. pylori genome is increased by uncoupling of polycistrons and by genome-wide antisense transcription. We also discovered an unexpected number of ~60 small RNAs including the ϵ-subdivision counterpart of the regulatory 6S RNA and associated RNA products, and potential regulators of cis- and trans-encoded target messenger RNAs. Our approach establishes a paradigm for mapping and annotating the primary transcriptomes of many living species.

Wednesday, March 3, 2010

Surprising mtDNA diversity

Surprising mtDNA diversity
Posted by Jef Akst at
[Entry posted at 3rd March 2010 06:00 PM GMT]
View comments(5) | Comment on this news story

Mitochondrial genomes are not uniform across cells of the body as previously believed, but vary between different tissue types, according to a study published online today (March 3) in Nature.

The findings may affect forensics and the search for biomarkers, both of which utilize mitochondrial DNA.

"I was surprised," said molecular cell biologist Hans Spelbrink of the University of Tampere, Finland, who was not involved in the research. "Mostly the assumption is that from the start of life individuals are homoplasmic," meaning that within an individual, mitochondrial DNA (mtDNA) is the same. However, the results of this study demonstrate "that each individual is a mosaic of multiple [mt]DNA types in various frequencies in different tissues," he said.

Previous studies have documented some degree of heteroplasmy -- variation in mtDNA in an individual -- but these findings were limited and mostly restricted to people with mitochondrial disorders, "where one would expect" to find such variation, Spelbrink said. "This is the first time [mitochondrial variation] was properly documented" in normal individuals.

Using high throughput sequencing technology, molecular geneticist Nickolas Papadopoulos of the Ludwig Center for Cancer Genetic and Therapeutics and the Johns Hopkins Kimmel Cancer Center in Baltimore and his colleagues analyzed the mitochondrial genomes of a variety of tissues in 2 different people and the lining of the colon in 10 others. In every individual, the researchers found at least 1 allele that differed between tissues, and one individual had as many as 14 heteroplasmies.

"That was a surprise when we saw the results," Papadopoulos said. "There's more than one mitochondrial genome present in each one of us. In addition to that, there were variations from tissue to tissue [in the levels of heteroplasmy observed], which may have implications in embryogenesis." The findings may also affect more practical applications in forensics science and the development of biomarkers for certain diseases, which often utilize mtDNA because it is abundant and easy to amplify, he added.

"When you look for biomarkers, you want to establish what the normal tissue looks like" in order to have a reference with which to compare the disease state, Papadopoulos said. With the recognition that mitochondrial genomes are quite variable even in normal tissues, "now we have to keep in mind [that] some of the changes we see may not really be [disease-related] mutations."

Thus, to use mitochondrial mutations as potential biomarkers, future studies "will have to investigate a lot of carefully determine the normal control range," molecular biologist Ian Holt of the Mitochondrial Biology Unit of the Medical Research Council in Cambridge, UK, wrote in an email to The Scientist. "Also, there is a big question mark about how early this increase in mtDNA variation appears in the blood. If it's only apparent once the cancer is well established then it isn't much use as a biomarker."

With regard to forensics, the normal variation in mtDNA "complicates things a little bit," Papadopoulos said. Because the mtDNA in one tissue might vary from another tissue, caution must be used when comparing a hair sample, for example, to blood. "The positive side is that, in principle, you could even distinguish monozygous twins, if you can characterize their heteroplasmy pattern," added molecular evolutionary biologist Nicolas Galtier of the Université Montpellier 2 in an email.

It's unclear why mtDNA is so variable. One reason may be that mitochondria have a higher mutation rate than nuclear DNA, said pediatrician and clinical geneticist Richard Boles of the Keck School of Medicine of University of Southern California. "It's really sitting in the heat of the furnace where it's likely to get damaged," Boles said, referring to the free radicals and other byproducts of energy metabolism that takes place in the mitochondria. Alternatively, it could be that the mitochondria have less effective DNA repair mechanisms.

These findings are likely to spur future studies to further characterize the diversity in mitochondrial genomes and determine the mechanism underlying the variation, Boles said. "This is certainly going to raise a lot of eyebrows."

Read more: Surprising mtDNA diversity - The Scientist - Magazine of the Life Sciences

Monday, March 1, 2010

KDM7 is a dual demethylase for histone H3 Lys 9 and Lys 27 and functions in brain development

Published in Genes & Development

1. Yu-ichi Tsukada1,2,3,
2. Tohru Ishitani4 and
3. Keiichi I. Nakayama1,2,5

+ Author Affiliations

1.1Division of Cell Biology, Medical Institute of Bioregulation, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan;
2.2CREST, Japan Science and Technology Agency (JST), Kawaguchi, Saitama 332-0012, Japan;
3.3PRESTO, Japan Science and Technology Agency (JST), Kawaguchi, Saitama 332-0012, Japan;
4.4Division of Cell Regulation Systems, Medical Institute of Bioregulation, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan


Methylation of histone H3 Lys 9 and Lys 27 (H3K9 and H3K27) is associated with transcriptional silencing. Here we show that KDM7, a JmjC domain-containing protein, catalyzes demethylation of both mono- or dimethylated H3K9 and H3K27. Inhibition of KDM7 orthologs in zebrafish resulted in developmental brain defects. KDM7 interacts with the follistatin gene locus, and KDM7 depletion in mammalian neuronal cells suppressed follistatin gene transcription in association with increased levels of dimethylated H3K9 and H3K27. Our findings identify KDM7 as a dual demethylase for H3K9 and H3K27 that functions as an eraser of silencing marks on chromatin during brain development.

Saturday, February 27, 2010

Scientists at the University of Essex have a greater understanding of how our genes are controlled following a major research project.

ScienceDaily (Feb. 26, 2010) — Scientists at the University of Essex have a greater understanding of how our genes are controlled following a major research project.

The findings of the study, which looked at how proteins work as teams to control genes in the cells, could also help to unravel the mechanisms of disease such as cancer.

The five-year research, funded by the Medical Research Council, has been published in Molecular and Cellular Biology.

The research team, led by Dr Elena Klenova from the Department of Biological Sciences, looked at the protein called CTCF, which was previously identified as a key 'controller' of many of our genes, making them either active or inactive.

However, the scientists at Essex have discovered that other proteins were working with CTCF for fine tuning of the genes. This collaboration between CTCF and its neighbours at the molecular level provides the mechanism by which CTCF's function as a gene regulator is controlled.

Dr Dawn Farrar, the principal researcher on the project, said the discovery of the link between CTCF and other proteins was a 'fascinating example of molecular teamwork'.

Dr Klenova, said: 'Understanding the factors responsible for the regulation of our genes, and how, why and when particular genes are switched on and off may give us a greater understanding of general biological systems. It also helps us to unravel the mechanisms of disease such as cancer. We believe that our published study has contributed to present knowledge of gene regulation.'

It is hoped scientists will be able to build on this research -- which was undertaken in collaboration with the Cancer Research UK Cambridge Research Institute and Karolinska Institute in Sweden -- to further understand the factors responsible for the regulation of our genes, and how this can lead to disease.

Friday, February 26, 2010

Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments

Published in Bioinformatics Journal

Hugues Richard1,*, Marcel H. Schulz1,2, Marc Sultan3, Asja Nürnberger3, Sabine Schrinner3, Daniela Balzereit3, Emilie Dagand3, Axel Rasche3, Hans Lehrach3, Martin Vingron1, Stefan A. Haas1 and Marie-Laure Yaspo3

1Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 2International Max Planck Research School for Computational Biology and Scientific Computing, and 3Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany

*To whom correspondence should be addressed. Tel: ; Fax: +493084131152; Email:

Received July 26, 2009. Revised November 26, 2009. Accepted January 17, 2010.


Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at

Regulation of Alternative Splicing by Histone Modifications

Published in Science

Reini F. Luco,1 Qun Pan,2 Kaoru Tominaga,3 Benjamin J. Blencowe,2 Olivia M. Pereira-Smith,3 Tom Misteli1,*

Alternative splicing of pre-mRNA is a prominent mechanism to generate protein diversity, yet its regulation is poorly understood. We demonstrated a direct role for histone modifications in alternative splicing. We found distinctive histone modification signatures that correlate with the splicing outcome in a set of human genes, and modulation of histone modifications causes splice site switching. Histone marks affect splicing outcome by influencing the recruitment of splicing regulators via a chromatin-binding protein. These results outline an adaptor system for the reading of histone marks by the pre-mRNA splicing machinery.