Conceived and designed the experiments: JS CG PW PB. Performed the experiments: CG JS PW. Analyzed the data: JS CG PW PB. Wrote the paper: JS CG PB. Designed the analysis of the RNA-Seq data: JS PB. Obtained permission for the use of and access to clinical samples: NL.
PB is currently a member of the Board of Directors of PLoS. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
Most human pre-mRNAs are spliced into linear molecules that retain the exon order defined by the genomic sequence. By deep sequencing of RNA from a variety of normal and malignant human cells, we found RNA transcripts from many human genes in which the exons were arranged in a non-canonical order. Statistical estimates and biochemical assays provided strong evidence that a substantial fraction of the spliced transcripts from hundreds of genes are circular RNAs. Our results suggest that a non-canonical mode of RNA splicing, resulting in a circular RNA isoform, is a general feature of the gene expression program in human cells.
Deep sequencing of RNA from biological samples, “RNA-Seq”, is a powerful tool for discovering and cataloguing novel alterations in the expression, sequence, and structure of transcriptomes. In the present study, we used RNA-Seq in a deliberate search for transcripts that could not be accounted for by conventional splicing of primary transcripts from an unrearranged human genome. Although our initial goal was to discover cancer-specific chromosomal rearrangements by identifying the resulting fused or rearranged transcripts, we also investigated the possibility that the exon order specified by the genome sequence might be rearranged during RNA processing. We were surprised to find numerous examples of transcripts in which the exon order was a circular permutation of the order encoded in the genome. We hypothesized that these anomalous transcripts might the result of intramolecular but non-canonical splicing events that joined a splice donor to an upstream (i.e. toward the 5′ end of the transcript) splice acceptor to produce a circular RNA molecule. Indeed, for many genes, in both cancer and normal human cells, we found RNAs with circularly permuted exon orders at levels comparable to those of the canonical, linear mRNA.
The first observation suggesting that eukaryotic RNAs can exist in circular form was made more than 30 years ago by electron microscopy
The best-characterized circular transcripts are in rodents. The mouse SRY gene, the sex-determining gene in males, consists of a single exon. During development, the RNA exists as a linear transcript that is translated into protein. In the adult testes, the RNA exists primarily as a circular product that is predominantly localized to the cytoplasm and is apparently not translated
All examples of circular transcripts reported to date in humans have been found to be expressed at low levels compared to the dominant canonical linear isoform, requiring sensitive nested PCR experiments for detection; these examples were discovered inadvertently or in an effort to characterize the structure of oncogenes. Circular RNAs have also been reported to be rare isoforms of the human Cytochrome P-450 2C18, and dystrophin transcripts
Two studies have suggested that scrambled exons may occur in more than a handful of transcripts. Roughly 150 sequences representing exon scrambling were found in EST databases; in each case, the authors argued that these sequences were unlikely to derive from circular RNA molecules, but rather represented bona fide examples of exon scrambling within a linear transcript
We recently reported evidence suggesting that local structural rearrangements in the human genome may be under-recognized genetic lesions contributing to the progression of human cancers
This analysis identified hundreds of genes with evidence of exon scrambling: in all, more than 1232 such genes had more than one junctional read. In addition, more than 700 isoforms with scrambled exons were estimated to comprise more than 10% of all transcript isoforms for a comparable number of genes. Genes with evidence of scrambling were shown to be statistically shared between the three leukocyte populations. A complete list of scrambled isoforms is provided in
Two models for the origin of the scrambled exons are diagrammed in
The canonical linear reference transcript is depicted with exons as colored boxes with four exons 1, 2, 3, and 4. Two simple models of RNA structure that could explain scrambled transcripts are depicted at left and right. At left, model 1 depicts how a scrambled exon 3-exon 2 junction could arise from a tandem duplication of exons 3 and 2, positioning the first copy of exon 3 upstream of exon 2. At the RNA level, this event could arise from post-transcriptional exon rearrangement, or a genomic duplication of exons 2 and 3. Under the model of tandem duplication, when one side of a paired-end read maps to the junction between exon 3 and 2, the other may map to any of exons 1, 2, 3 or 4 with probabilities determined by the library's insert length distribution and the exon lengths. Our data supports paired-end mapping between a junction and exons 2 or 3, but not exons 1 and 4. We note that in principle, the scrambled exon 3 - exon 2 junction could arise from other splicing events and does not necessarily entail tandem duplication. At right, model 2 depicts how a scrambled exon 3 - exon 2 junction could arise from splicing of exons 2 and 3 into a circular RNA molecule, again positioning exon 3 upstream of exon 2. In this model, when one side of a paired-end read maps to the junction between exon 3 and 2, the other will map to exon 2 or exon 3.
We therefore analyzed RNA from 3 leukocyte cell types: naïve B cells (CD19+), hematopoietic stem cells (CD34+) and neutrophils from the same individual, for the presence of scrambled exons, by paired-end sequencing (80 bp per end) of ribosomal RNA-depleted (Ribo-zero) total RNA. We obtained roughly 8, 19, and 5 million reads aligning to the UCSC knowngene transcriptome (mapped as single reads) from CD19+, CD34+, and neutrophils, respectively. We found sequence evidence for scrambled exons comprising at least 10% of transcripts from each of more than 800 genes in CD19+ cells, CD34+ cells, and neutrophils, respectively. Genes with evidence of scrambling in one leukocyte cell type were significantly more likely also to have transcripts with scrambled exons in one or more of the other cell types we analyzed.
We also analyzed publicly available paired-end data from poly-A selected HeLa and H9 Human embryonic stem cells
The scrambled exon junctions in transcripts of C1orf58, CCDC126, FKBP8, PCMTD1, SMARCA5, and SETD2 predicted by the primary sequence data, were all directly confirmed by RT-PCR from normal leukocyte RNA, followed by Sanger sequencing. Eleven additional instances of exon scrambling in eight distinct genes in ALL or HeLa were also confirmed by RT-PCR and Sanger Sequencing (see
We estimated the abundance of scrambled isoforms relative to their canonical linear counterparts by two genome-wide statistical methods. For the first estimate, we counted paired-end reads in which one read mapped to a scrambled junction (a junction between exon-X and exon-Y where X≥Y), and the other read also mapped in an orientation and position consistent with the hypothetical scrambled transcript. This number was then compared to the total number of paired-end reads consistent with the splice junctions between exon X and exon Y in the canonical order predicted by the reference genome sequence. The relative abundance of scrambled transcript isoforms as a fraction of all transcripts for a given gene is depicted for three normal leukocyte cell types in
Analysis of paired-end RNA-Seq data from random primed libraries reveals evidence that scrambled exons are present at high stoichiometries compared to the canonical linear transcript transcribed from a large number of human genes. This phenomenon persists across cell types and is illustrated by the expression patterns of 3 leukocyte cell types: CD19 (B cells), CD34 (stem cells) and neutrophils. The fraction of each scrambled transcript as a fraction of total gene expression is computed. The bar plot depicts the number of circular isoforms with estimated abundance relative to all transcripts of the gene in the following ranges: between 0–25%, 25–50%, 50–75% and 75+%. Hundreds of isoforms in each cell type are estimated to represent more than half of all transcripts from each gene.
By this first method (method 1), we identified 229, 207, and 122 scrambled transcripts, representing 481 distinct genes, that were expressed at levels comparable to the canonical linear isoforms in human CD19+ leukocytes, CD34+ leukocytes and neutrophils, respectively. By the same method, we estimated that 513, 275, and 320 scrambled transcripts, representing 880 distinct genes, were at least 10% of the abundance of the canonical isoform, in CD19+ cells, CD34+ cells and neutrophils, respectively. The differences among these cell types in the apparent relative abundance of the scrambled isoform were consistent with sampling variation due to differential expression of the corresponding gene among these cell types (see
Our direct observations of scrambled isoforms indicate that they comprise at least 10% of all transcripts for roughly 1–2% of genes detectably expressed in the CD19+ cells, CD34+ cells and neutrophils. Ignoring relative expression level of the scrambled isoform, we identified a scrambled isoform in roughly 10% of all detectably expressed transcripts. This is likely to be an underestimate of the prevalence of scrambled isoforms, as the pathognomonic junctional reads from inabundantly expressed genes were unlikely to be detected with this sequencing depth - more than half of the distinct transcript isoforms detected in these samples were sequenced at an average depth of one or less (
The most abundant transcripts with scrambled exon order in the CD19+, CD34+, neutrophils samples, respectively, corresponded to KIAA0182 (encoding a putative subunit of the BRAF-HDAC complex), MAN1A2 (encoding an alpha-mannosidase), and CCDC126 (encoding a coiled-coiled containing protein of unknown function). Exon-scrambled isoforms accounted for more than half of all transcripts from each of these genes, as estimated by method 1 above. We also detected low levels of scrambled isoforms of noncoding RNAs (see NR identifiers in
In addition, we used a second statistical method that did not rely on exon annotation to detect and quantitate transcripts with scrambled exons. For each 200 bp window tiling the length of each gene, we compared the number of paired-end reads in two categories; category 1, where side 1 mapped to a fixed window and side 2 mapped in a relative orientation to side 1 inconsistent with the genomic exon order; and category 2, paired-end reads where side 1 mapped to a window with a relative orientation to side 2 consistent with the genomic order. This procedure allowed us to identify candidate exon-scrambled transcripts without relying on a database of hypothetical scrambled exon junction.
We considered two models to explain the observed transcripts with scrambled exon order (
In cases where the size of the hypothetical circular isoform was comparable that of the inserts in our library, we used a statistical approach to evaluate the evidence that observed scrambled exons originated in a circular RNA. To illustrate this method, a junctional read supporting the presence of exon 3 upstream of exon 2 could be explained by a 2-exon circle composed of exons 2 and 3, or a tandem duplication of exons 2 and 3. Only in the latter case would we expect to observe paired reads comprising the non-canonical 2–3 junction paired with a sequence from exon 1 or exon 4.
Reads supporting the tandem duplication hypothesis provide direct evidence that only a very small fraction of transcripts with scrambled exons are consistent with a linear RNA structure: in our sequence data from human leukocytes we found evidence of a linear RNA with scrambled exons for only 23 genes, fewer than 2% of all transcripts with evidence of scrambled exons (see
About 32% of the scrambled exon junctions we observed were consistent with a hypothetical circular RNA larger than the insert lengths that were well-sampled in our library; in such cases, the statistical approach described above could not be used to infer the probable structure of these isoforms. However, as detailed below, biochemical analysis of nine isoforms with scrambled exons, including cases in which the inferred size of the hypothetical circular RNA would be greater than the insert length distribution, provided strong evidence that most scrambled exons represent circular RNA molecules.
We used susceptibility to RNaseR as a biochemical assay for the topology of the RNA molecules containing scrambled exons. RNaseR is a 3′ to 5′ exoribonuclease that degrades essentially all linear RNA molecules, regardless of structure, provided the RNA contains a 7 bp 3′ overhang
Panel A: Total RNA from HeLa cells was digested with RNaseR at varying enzyme concentrations (0, 3, 10, and 100 units) after the RNA was depleted of ribosomal RNA. Primers capable of amplifying the canonical linear transcript and the predicted circular transcript (by outward facing primers within a single exon predicted in the scramble) were used in a RT-PCR experiment for each of the digestion conditions. Canonical transcripts were consistently degraded by RNaseR, only detectable by PCR at 0 units of RNaseR, whereas predicted circular transcripts consistently resisted the RNaseR challenge, providing strong evidence of circularity. FBXW4 and MAN1A2 respectively show 2 and 4 circular isoforms, both of which were predicted by the sequencing data. The predicted lengths of circular isoforms are respectively a 3-2 junction of CAMSAP1 (predicted to produce a 435 bp circle), a 4-2 and 5-2 junction of FBXW4 (predicted to produce 415 and 510 bp circles), a 4-2, 5-2 and 6-2 junction of MAN1A2 (predicted to produce 471, 553, and 648 bp circles), a 3-3 junction in REXO4 (predicted to produce a 338 bp circle), a 2-2 junction of RNF220 (predicted to produce a 742 bp circle) and a 3-2 junction of ZKSCAN1 (predicted to produce a 667 bp circle). Panel B: A northern blot on total and cytoplasmic lysate from HeLa cells shows hybridization of a 481 bp probe complementary to the MAN1A2 5-2 exon scramble. 3.7 and 6.2 ug of total and cytoplasmic RNA were loaded onto a 1% agarose gel and 10 pM of probe was hybridized for 24–48 hours. Detection was performed using the BrightStar BioDetect Kit (Ambion, Austin, TX). The specific band at 553 bp corresponds to the predicted size of a circular RNA containing exons 2,3,4 and 5 of MAN1A2.
Using a publicly available dataset of short sequence reads from matched poly-A selected and poly-A depleted RNA fractions, we further tested the hypotheses that a) a subset of the scrambled exons we identified are contained in linear, poly-adenylated RNA molecules and b) a second subset of the scrambled exons we identified are contained in non-poly-adenylated, presumably circular, molecules. More specifically, the RNA samples from which these sequences derived were obtained by a double poly-A selection; the selected molecules were prepared for sequencing; the flow-through fraction from the second poly-A selection was depleted of ribosomal RNA and also sequenced
Overall, the scrambled exon junctions from suspected circular RNAs were observed at tenfold higher frequency in the poly-A depleted fraction in both HeLa and H9 cells than in the poly-A selected fractions (see
Single-end 76-bp RNA-Seq was performed on matched experiments on HeLa, and H9 Human embryonic stem cell lysates were polyA selected and polyA depleted (data from Yang et al
For a small fraction of the scrambled exon junctions we found compelling RNA-Seq evidence that they derived from linear RNAs. In one case, the apparently “scrambled exons” are explained by a known exon duplication in the human genome (IFI16) which served as a convenient positive control; in another case (STAU2), the exons affected by the apparent tandem duplication correspond to a known human structural variant (see
Previous reports have proposed that the lariat structure containing exons excluded by alternative splicing could undergo subsequent splicing to yield circularly spliced isoforms
Alternatively spliced complements to only 13 of 576 distinct scrambled isoforms were supported by more than 1 sequencing read (2.2%). The failure to detect essentially any evidence for the predicted complementary linear transcripts is circumstantial evidence that most circular transcripts are unlikely to be incidental byproducts of alternative splicing.
In addition to testing for the co-occurrence of complementary alternatively spliced transcripts and circular RNA, we evaluated the organization of genes from which circular transcripts were transcribed. This analysis revealed a statistical enrichment of exon 2 (p value ≪10−4) as the acceptor exon in circular RNA, and a statistically significant enrichment in median intron length among genes with circular isoforms, particularly in the intron upstream of acceptor exons and to a lesser extent in the intron downstream of donor exons (see
To begin investigating cellular localization of circular transcripts, we studied the localization of an isoform of MAN1A2, a ubiquitous and highly expressed circular transcript, and a handful of other isoforms with scrambled exons that we confirmed were circular. HeLa whole cells lysates were fractioned into nuclear and cytoplasmic fractions. The nuclear-localized noncoding RNA, XIST, served as a control for fractionation: as expected, it was enriched in the nuclear fraction and depleted in the cytoplasmic fraction. The total, cytoplasmic, and nuclear RNA were run on a denaturing gel to evaluate the ribosomal RNA components. As expected, the nuclear fraction had diminished 28S and 18S rRNA bands, and ribosomal RNA precursor bands were present in the nuclear, but not the cytoplasmic fraction (see
HeLa whole cells lysates were fractioned into cytoplasmic and nuclear. The nuclear localized noncoding RNA XIST served as a control for fractionation:, and as expected, was enriched in the nuclear fraction. In addition, precursor ribosomal RNA bands were present in the nucleus but not the cytoplasm (see
We hypothesized that circular RNA isoforms were not uniquely prevalent in human cells, but a common feature of the gene expression program in animals and perhaps more broadly. We analyzed publicly available paired-end RNA-Seq data from ribosomal RNA-depleted total RNA, consisting of 51-bp paired-end reads (id SRR029642 in the short read archive). We analyzed these data using the same pipeline we had applied to the leukocyte data (using the mm9 RefSeq and UCSC annotations (downloaded 9/2011) as the reference). We found more than one thousand mouse genes with evidence of exon scrambling (see
For more than 30 years, sporadic reports have described the presence of circular mRNA transcripts in mammals. Many of these circular RNAs were discovered serendipitously, and were largely disregarded as nonspecific byproducts when they were found to be expressed at low levels. In contrast to this prevailing view on the abundance of circular RNA isoforms, we found strong evidence that circular isoforms of hundreds of human transcripts, with out of order splice junctions precisely at normal exon boundaries, were present at levels comparable to their canonical linear counterparts. While the canonical linear transcripts were sensitive to RNaseR treatment in all tested cases, transcripts with scrambled exons were generally resistant to RNaseR treatment, as expected for circular RNAs, but not for RNAs with scrambled exons generated by template switching or other artifacts of reverse transcription from a canonical linear RNA. In an analysis of publicly available RNA-Seq data from HeLa and H9 human embryonic stem cells, we found that RNAs with scrambled exons were enriched in non-poly-adenylated fractions. Taken together, these findings provide strong evidence that most scrambled exon sequences we detected in human RNA were derived from circular molecules.
While we have found that hundreds of human genes express circular RNA isoforms, the limitations of our experimental design may actually have led us to underestimate the prevalence of circular RNA isoforms. First, the size selection step during sequencing library preparation would miss small circular RNAs, or highly structured circular RNAs with fragmentation kinetics incompatible with our size selection. It is also possible that our size selection procedure would enrich for circular RNAs in the selected size range that failed to be fragmented. Second, although our sequencing depth was adequate to detect diagnostic junctional reads for many circular RNAs, it may not have been adequate to accurately identify and quantify rare circular isoforms, and our search for exon-exon junctions was restricted to exons annotated in RefSeq, which we know to be an incomplete catalogue of exon boundaries. Thus, we may still be underestimating the number or prevalence of circular RNAs in human cells.
The previously unappreciated abundance and diversity of circular RNAs in human cells raises important questions: What is the molecular mechanism of circular splicing? Recent evidence suggests that canonical pre-mRNA splicing does not necessarily proceed in sequential order from the 5′ to 3′ end of the RNA
At left: a schematic diagram of the canonical splicing process splicing out the first intron of the a pre-mRNA of a 4 exon gene, and subsequent removal of introns 2 and 3. Canonical splicing of exon 1 to exon 2 occurs when the splicing machinery catalyzes the formation of the intron lariat and the attack of the free 3′ OH of exon 1 on the 3′ splice site upstream of exon 2. This produces a lariat containing intron 1 and a pre-mRNA with exons 1 and 2 spliced together. At right: a model for the production of circular transcripts. If there is a canonical transcriptional start, and if intron excision does not proceed sequentially in time from the 5′ to 3′ direction of the pre-mRNA, non-canonical pairing of 3′ and 5′ splice sites could be generated. Since the sequences of each 5′ splice site of the pre-mRNA contain the same splicing signals, it is possible that the 3′ splice site upstream of exon 2 is paired with the 5′ splice site downstream of exon 3 and splicing proceeds as if this 5′ splice site were paired with the 3′ splice site upstream of exon 4. In this case, exon 3 would be spliced upstream of exon 2, creating a pre-mRNA intermediate comprised of these two exons and intron 2. Canonical splicing would be predicted to excise this intron, leaving a circular RNA composed of exons 2 and 3. Non-canonical transcription start, as suggested in
How widespread and evolutionarily conserved is circular splicing? Our preliminary analysis of ribosomal-RNA depleted RNA from the mouse brain suggests there are hundreds of genes with scrambled isoforms in that organ; we would not be surprised to find that this phenomena is pervasive across the animal kingdom and perhaps more broadly. In addition, while we have not found evidence of differential circular isoform expression between tissues, this area should be further explored.
What functions do circular RNAs serve and what roles might they play in normal human biology or disease? The vast majority of circular RNA molecules we detected were transcribed from a gene that is also known to encode a conventional linear mRNA.
Our data suggest there are at least a handful of known noncoding RNAs with circular isoforms. A circular isoform of the noncoding RNA ANRIL has been reported to correlate with INK4/ARF expression, as well as atherosclerosis risk, suggesting the possibility that circular RNAs might have some role in human disease. An antisense transcript of the CDR1 gene, relatively abundantly expressed in mouse brain, was recently shown to be circular, and to positively regulate the corresponding sense transcript
We cannot yet rule out the possibility that they are incidental to regulated splicing of a conventional linear RNA and perhaps accumulate due to their relative resistance to degradation. However, the high abundance of many circular RNA isoforms relative to their linear counterparts and lack of evidence for the predicted alternatively spliced linear RNA co-product suggest that circular RNA isoforms are not simply accidental byproducts of splicing. Further investigations of origins and activities of circular isoforms of mRNAs are likely to lead us in surprising directions.
Samples were collected following procedures approved by the IRB at Stanford University.
5 hyperdiploid acute lymphoblastic leukemia diagnostic bone marrow samples were obtained on a protocol approved by the Stanford Institutional Review Board. The cells were counted and frozen using Ficoll-Paque Plus (Amersham Biosciences, Piscataway, NJ) at a concentration of 0.5–10 million cells/ml. Peripheral blood was also obtained from the same patients after they were confirmed to be in remission based on the inability to detect minimal residual disease after induction therapy. The buffy coat was isolated by pipetting after centrifugation. CD19+, CD34+, and neutrophils that were isolated from the bone marrow of a single individual were purchased from All Cells Inc (Emeryville, CA). Total RNA was isolated from all samples using Trizol (Life Technologies, Carlsbad, CA) followed by RNEasy column (Qiagen, Valencia, CA) according to the manufacturer's instructions.
A modified protocol from our previous work was used to produce the sequencing libraries
All mapping was performed using Bowtie version 0.12.1. Paired-end RNA-Seq data was mapped to the reference transcriptome (UCSC knowngene annotation) allowing up to three mismatches and any number of alignments. Reads uniquely aligning to the RefSeq annotation database were considered to represent gene expression. Reads failing the alignment to the UCSC knowngene annotation were mapped to a custom database of all intragenic exon-exon junctions in expressed genes with exons determined by the RefSeq database. Paired-end reads where read 1 mapped to an exon-exon junction inconsistent with the reference transcript and read 2 mapped to the same transcript, and were considered as supportive of an exon scramble. The total number of distinct paired offsets and mismatch profiles were summed, and the rate of reads supporting scrambled exons was computed per nucleotide (denoted j) and used as an estimate of the relative abundance of each scrambled isoform. Per-nucleotide gene expression (e) was estimated as the number of single reads mapping to a gene divided by gene length. The quantitative estimates of the relative expression of a scrambled transcript compared to the canonical linear isoform quoted in this paper was made using the ratio of j/e.
In a separate analysis, we employed paired-end mapping ratios to further quantify the relative abundance of scrambled isoforms to canonical linear isoforms. Each gene was tiled by dividing it into even length bins of 200 base pairs. For each pair of bins, the number of reads aligning to bin A in the+orientation and bin B in the - orientation was computed. To detect scrambled and canonical linear transcripts, bin A could be up or downstream of bin B. Conditional on reads aligning to bin A, the relative ratios of reads aligning to each bin B were computed. This procedure is further detailed in
In addition to the data we generated, the following public data was downloaded from the SRA and analyzed: paired-end data from H9 cells (SRR065491, SRR065495, SRR065504, SRR065521, SRR066678, SRR066679) and HeLa (SRR065529) and single end data from
Total RNA reaction was reverse transcribed using the SuperScript III First-Strand Synthesis System (Life Technologies, Carlsbad, CA) with random hexamers according to the manufacturer's instructions. 20 ng of cDNA was then used for each PCR validation using Platinum Blue PCR Supermix (Life Technologies, Carlsbad, CA). PCRs were run for 35 cycles under standard conditions with 30 second 55 degree extension.
The following procedure was used to calculate the per gene probabilities for observing evidence of a linear transcript among genes with evidence of scrambled exons.
Compute the empirical length distribution of all sequenced fragments by aligning reads to the RefSeq annotation.
Compute quantiles from 0–100 of empirical insert length distribution, and call this distribution F.
For each gene g with an exon X to exon Y (X≥Y) junction, compute inferred circle length, sum of length of exons between Y and X on a per gene basis (call this length L).
For each read 1 crossing the exon X- exon Y junction, approximate the probability that, if the scramble were contained in a linear transcript, read 2 would align to sequence upstream of exon y or downstream of exon X by 1-F(L). This is an approximation because the exact alignment offset of the junction (which can vary between 0 and 60) will influence this probability. Call this probability p. In most cases, this approximation will not be biased towards increasing or decreasing the calculated probability p. For n junctional sequences observed for a given gene, the probability of seeing 0 reads aligning to sequences outside of the exon X to exon Y interval is p0(g) = (F(L) )n.
Using the probabilities p0(g) over all genes, we computed the number of total genes expected to have observed reads supporting linearity under the hypothesis that scrambled exons are contained in linear, rather than circular, transcripts by computing the number of genes with p0(g) in the qth quantile and multiplying this by the average p0(g) value. The result establishes an under-representation of reads supporting the hypothesis that scrambled exons are contained in transcripts with exons upstream of exon y or downstream of exon X.
HeLa total RNA was isolated with TRIZOL, followed by rRNA depletion using RiboMinus (Life Technologies). Samples were then treated with DNase I (Fermentas, Glen Burnie, MD) for 15 min at 37°C, then diluted 16-fold into RNase R buffer containing 0, 3,10, or 100 units of RNase R (Epicentre) per µg of RNA. We have observed some batch to batch variation in the results of our RNase R assays and suspect that some batches of RNase R may have contaminating endoribonuclease activity: we include this as a note of caution. Reactions were purified with RNA Clean & Concentrator-5 columns (Zymo Research, Irvine, CA). Reaction product corresponding to 5 µg of input total RNA was reverse-transcribed in a 20 µl Superscript III reaction with random hexamers (Life Technologies). cDNA reaction (1/100th of PCR volume) was used as template in a 35-cycle PCR with Phusion polymerase (New England Biolabs, Ipswich, MA), each cycle being: 94°C 5″; 68°C 15″; 72°C 2′.
Total, nuclear, and cytoplasmic RNA were isolated using the PARIS Kit (Ambion, Austin, TX) according to the manufacturer's instructions. Primer and probe pairs for each qPCR reaction were designed using the RealTime PCR Assay Design Tool (Integrated DNA Technologies, San Jose, CA), see
50 ng of cDNA was added to Taqman Gene Expression Master Mix (Applied Biosystems, Carlsbad, CA) along with primer and probe at final concentrations of 0.5 and 0.25 µM, respectively. Assays were run for 40 cycles using standard conditions on the 7900HT Fast Real-Time PCR System (Applied Biosystems, Carlsbad, CA). Ct values were generated using SDS software. Relative expression levels of linear and circular isoforms were computed as differences in Ct values, and standard errors were taken over biological replicates.
Probe was generated from PCR product of MAN1A2 circular RNA. A band that had previously been confirmed to correspond to the 5-2 junction by Sanger sequencing was isolated using the QUIAquick Gel Extraction Kit (Qiagen, Valencia, CA). The probe was then biotinylated using the Brightstar Psoralen-Biotin Kit (Ambion, Austin, Tx). Total and cytoplasmic RNA were isolated from HeLa using PARIS Kit (Ambion, Austin, TX). Labeled marker, 3.7 and 6.2 ug of total and ctyotplasmic RNA were run on a 1% agarose gel, transferred, and hybridized according to NorthernMax Kit protocol (Ambion, Austin, TX). 10 pM of probe was hybridized for 24–48 hours. Detection was performed using the BrightStar BioDetect Kit (Ambion, Austin, TX).
MAN1A2 Probe (481 bp)
(EPS)
(EPS)
(EPS)
(EPS)
(TAB)
(TAB)
(DOC)
(CSV)
(TAB)
(XLS)
(DOC)
(CSV)
(DOC)
We thank John Coller and Ji Xuhuai (Stanford Functional Genetics Facility) for performing the Illumina sequencing, Raymond Chen for reagents and the Brown Laboratory for useful discussions.