Hugues Richard1,*, Marcel H. Schulz1,2, Marc Sultan3, Asja Nürnberger3, Sabine Schrinner3, Daniela Balzereit3, Emilie Dagand3, Axel Rasche3, Hans Lehrach3, Martin Vingron1, Stefan A. Haas1 and Marie-Laure Yaspo3
1Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 2International Max Planck Research School for Computational Biology and Scientific Computing, and 3Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany
*To whom correspondence should be addressed. Tel: ; Fax: +493084131152; Email: email@example.com
Received July 26, 2009. Revised November 26, 2009. Accepted January 17, 2010.
MATERIALS AND METHODS
Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at http://cmb.molgen.mpg.de/