Conceived and designed the experiments: CSA MB J. Chen JBF JSF FK MR. Performed the experiments: CSA J. Chen BK. Analyzed the data: CSA J. Chen JBF. Contributed reagents/materials/analysis tools: GYC J. Chien LCL SL SVN TO JFZ. Wrote the paper: CSA JBF.
JBF, J. Chen, CSA, JSF, BK, MB, FK and MR are employees and shareholders of Illumina, Inc., where this study was conducted. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
We have developed a high-throughput amplification method for generating robust gene expression profiles using single cell or low RNA inputs.
The method uses tagged priming and template-switching, resulting in the incorporation of universal PCR priming sites at both ends of the synthesized cDNA for global PCR amplification. Coupled with a whole-genome gene expression microarray platform, we routinely obtain expression correlation values of R2∼0.76–0.80 between individual cells and R2∼0.69 between 50 pg total RNA replicates. Expression profiles generated from single cells or 50 pg total RNA correlate well with that generated with higher input (1 ng total RNA) (R2∼0.80). Also, the assay is sufficiently sensitive to detect, in a single cell, approximately 63% of the number of genes detected with 1 ng input, with approximately 97% of the genes detected in the single-cell input also detected in the higher input.
In summary, our method facilitates whole-genome gene expression profiling in contexts where starting material is extremely limiting, particularly in areas such as the study of progenitor cells in early development and tumor stem cell biology.
Recently, there has been growing interest in obtaining gene expression profiles from single cells, as it has become increasingly evident that the heterogeneity present in cell populations is such that population-based transcriptional profiles may not reflect the regulatory networks functional at the individual cell level
The analysis of single cancer cells can potentially overcome the shortcomings of tumor heterogeneity and help pinpoint driver mutations that spur the initial development of tumors, and identify which mutations lead to metastasis, cancer progression and resistance to therapy. However, a key technological challenge in the transcriptional profiling of single cells is that most whole-genome amplification protocols suffer from significant amplification bias. While there have been several recent advancements in the capture and isolation of single cells, such as cell picking
The underlying RNA or cDNA amplification strategies employed in most of these studies include either linear antisense RNA amplification or homomeric/TdT tailing followed by exponential amplification. While the former approach has been a mainstay for amplifying nanogram amounts of total RNA, there have been relatively few studies in which single cell quantities have been assayed
Many of these approaches have not been widely adopted either because they suffer from amplification bias, are not sufficiently scalable or robust for high-throughput applications, are not suitable in eukaryotic contexts, or a combination of these factors. Here we describe a template-switch-based high-throughput method that is capable of generating robust whole-genome gene expression profiles at the single cell level.
The pre-amplification method described here exploits the template switching ability of some reverse transcriptases which allows the 3′ tagging of cDNA, thereby facilitating the incorporation of universal PCR primer sites at both ends of the synthesized cDNAs (
(1) First strand cDNA synthesis is primed with tagged oligo-dT and random 9-mer primers. The tagged oligo-dT primer contains a VN anchor followed by a T-30 stretch with a 5′ PCR tag. The tagged random 9-mer consists of a 9-mer followed by the identical 5′ PCR tag. (2) Upon reaching the 5′ terminus of the mRNAs, the reverse transcriptase, via its terminal transferase activity, adds a few nucleotides (predominantly deoxycytidine) to the 3′ end of the newly synthesized cDNAs. (3) The template-switch primer, which consists of the same 5′ PCR tag as well as a 3′ riboguanine stretch, anneals via GC complimentary base-pairing to the 3′ end of the cDNAs, thereby serving as a new template for the reverse transcriptase. (4) After cDNA synthesis, both ends of the cDNAs now contain the identical PCR tag, allowing exponential amplification of the entire cDNA population through single primer PCR (5).
Previous template-switching-based amplification protocols utilized oligo-dT-based priming for cDNA synthesis followed by a single-phase PCR amplification reaction
Input | Condition | Self-Reproducibility (R2) | Correlation with 1 ng (R2) | Sensitivity |
Probe Concordance (%) |
50 pg UHR total RNA |
T30 + one-phase PCR | 0.374 | 0.473 | 6595 | 95.9 |
50 pg UHR total RNA |
T30 + two-phase PCR | 0.481 | 0.585 | 8019 | 95.1 |
50 pg H9 total RNA |
T30 + N6 + one-phase PCR | 0.626 | 0.695 | 10449 | 92.4 |
50 pg H9 total RNA |
T30 + N9 + one-phase PCR | 0.627 | 0.688 | 10332 | 92.2 |
50 pg UHR total RNA |
T30 + N9 + two-phase PCR | 0.698 | 0.806 | 13443 | 96.6 |
Single HeLa cells |
T30 + N9 + two-phase PCR | 0.757 | 0.801 | 11083 | 97.4 |
Values shown for the self-reproducibility and correlation are derived from all probes.
Sensitivity is calculated as the number of probes detected at p-value<0.01.
Probe concordance is calculated as a percentage of the number of probes with matching detected calls at p-value<0.01 between the low (50 pg or single cell) and standard (1 ng) inputs divided by the total number of probes detected in the lower input.
24 K WG-DASL.
29 K WG-DASL HT.
We first assessed the impact of different cDNA priming methods, during the reverse transcriptase step, on the performance of our assay. Here we evaluated three conditions, namely: oligo-dT (T30), oligo-dT + random hexamer (T30+N6) or oligo-dT + random nonamer (T30+N9). Replicate inputs of 50 pg H9 cell total RNA were used for all tested priming conditions after which pre-amplified products were used as inputs for the 24 K WG-DASL Assay. While typical assay reproducibilities of R2∼0.37 were obtained for the T30 condition, improved self-correlations of R2∼0.63 were observed for both the T30+N6 and the T30+N9 priming conditions (
Previous experiments performed with different numbers of PCR cycles (15, 18, 21, 24 and 27 cycles) using different RNA inputs (50 pg and 1 ng) demonstrated that the assay performance (reproducibility, sensitivity and correlation with higher inputs), was poorest at the extremes of our chosen cycle ranges (15 and 27), but optimal at 21 PCR cycles (data not shown). To reduce the impact of stochastic effects on low copy numbers during the early cycles, we sought to improve the efficiency and fidelity of amplification by applying an altered thermal profile for the first few PCR cycles. We next therefore assessed the effect of two different PCR cycling profiles on our assay performance, namely a single-phase profile with an annealing temperature of 65°C, and a 24 cycle, two-phase profile consisting of an initial five PCR cycles carried out at a lower annealing temperature (58°C), followed by 19 cycles at a higher (65°C) annealing temperature (see
A key performance characteristic of any single cell genomics assay is its ability to discriminate among different samples at low input levels. In order to further characterize our assay we used T30+N9 priming together with the two-phase PCR profile described earlier to assay two different RNA inputs. Triplicate aliquots of UHR and BR, each at 10 pg, 50 pg and 1 ng total RNA were used in conjunction with the 29 K WG-DASL HT Assay. RNA quality was assessed using the Bioanalyzer 2100 and yielded RIN values of 9.6 and 9.2 for the UHR and BR samples, respectively (data not shown). On average, our intra-sample self-reproducibilities were R2∼0.42, R2∼0.69 and R2∼0.96 for the 10 pg, 50 pg and 1 ng UHR and R2∼0.34, R2∼0.61 and R2∼0.95 for the 10 pg, 50 pg and 1 ng BR RNA inputs, respectively (
(A) 50 pg UHR and BR total RNA and (B) single HeLa and brain tumor (BT) cells; 50 cell tumorsphere (TS) and adherent cells (AC). Pair-wise scatterplots of at least two replicates for each input type are shown for all 29 K probes across the full range of raw signal intensities. Correlations are the square of Pearson's correlation coefficient.
Having obtained robust data using picogram quantities of RNA, we next repeated the experiment, using individual cells as inputs. Here we used single HeLa and primary brain tumor (BT) cells. As before, all samples were processed in triplicate. We observed a similar trend to that obtained for the RNA equivalent inputs, with intra-sample self-reproducibilities of R2∼0.76 (
We next ranked the fold-change differences between the TS and the AC samples and further analyzed the top 100 over-expressed and 100 under-expressed genes in the tumorspheres relative to their attached counterpart. Using DAVID
In order to determine the extent to which the gene expression profiles obtained at low input levels correlated with those obtained with higher inputs, we directly compared raw signal intensities between the lower and higher inputs. Correlations between 50 pg and 1 ng total RNA typically yielded R2∼0.80 (
Raw signal intensity correlations between (A) 50 pg (x-axis) and 1 ng (y-axis) UHR total RNA; (B) 10 pg (x-axis) and 1 ng (y-axis) UHR total RNA; (C) single HeLa cell (x-axis) and 1 ng (y-axis) HeLa total RNA. The overlapping sets of detected probes between the low and higher inputs are shown for both the RNA equivalent (D, E) and single cell (F) inputs. All probe values shown are at a threshold of p<0.01.
Over the last few years there have been several reported studies on either single cell gene expressing profiling using low gene density (1–100) assays
Our WG-XSC assay is highly reproducible, typically yielding R2∼0.76 and ∼0.69 for single cell and 50 pg RNA inputs, respectively. The transcript representation as assessed by the correlation between lower inputs and larger standard inputs yielded R2∼0.80 and ∼0.81 for single cell and 50 pg RNA inputs, respectively. Of the few microarray-based single cell transcriptional studies with self-correlation metrics the reported R values range between 0.73–0.91
Two obvious, but critical steps that could impact levels of reproducibility and representation include the extent of cell lysis as well as the efficiency with which low abundance mRNA molecules are converted to cDNA. In order to minimize the loss of material, and maximize the synthesis of cDNA in an unbiased fashion, our protocol specifically incorporates the use of a phase-switch microfluidics device and low-retention plasticware for single cell isolation, oligo-dT and random priming for cDNA synthesis and a two-phase thermal profile for PCR amplification.
An additional feature of our approach is the ability to process up to 96 samples in parallel, thereby greatly reducing the associated labor costs as well as minimizing variation/bias that may arise from handling individual samples. This feature is of particular relevance for single cell expression profiling where substantial variation in transcript levels among phenotypically identical single cells has been well documented, thereby necessitating the simultaneous analyses of large numbers of individual cells
Our high-throughput assay generates whole-genome gene expression profiles with single cell or low RNA inputs. This robust and scalable method for profiling a variety of cell types at the single cell level can be applied to critical questions in a broad range of areas, including developmental biology and cancer biology. We have used the technology for gene expression profiling in circulating tumor cells isolated from prostate cancer and ovarian cancer patients' blood, as well as molecular and functional characterization of early lineage commitment of human hematopoietic stem cells (data not shown). The ability to obtain genome-wide gene expression data on many individual cells in parallel will be extremely valuable in a variety of contexts, including detailed molecular lineage tracing studies and clinical studies aimed at biomarker discovery.
RNA from the WA09 (H9)
A microfluidics device with a phase-switch feature was used for isolating individual cells. Briefly, cultured cells were harvested with trypsinization and washed with PBS, whereafter a single cell suspension in PBS was load into a phase-switch microfluidics device for encapsulation of individual cells into droplets. Cells were encapsulated from the aqueous phase (PBS) into droplets in the oil phase by either laser-cavitation or T-junction break-up of immiscible threads as previously described
An ovarian cancer cell line, RMG1
All cell lysis and cDNA reactions were performed using 0.2 ml Maxymum Recovery PCR tubes (Axygen, Union City, CA, USA). The cell lysis, reverse transcription, template switching and pre-amplification reactions were all performed in a single tube. Briefly, for cell lysis, 1.8 µl SLB was added directly to the isolated single cell. Tubes were placed in a thermocycler and heated to 72°C for 3 min, followed by five min at 4°C. After cell lysis, 3.2 µl Single cell cDNA Synthesis Buffer (SCB, Illumina, Inc.) was added to the lysed single cell. The reverse transcription and template switching reactions were performed at 42°C for 60 min, followed by a 10 min 70°C inactivation step. After cDNA synthesis 32 µl of Single cell PCR Mix (SPM, Illumina, Inc.) was added directly to the unpurified products followed by amplification using a PCR cycling profile which consisted of an initial denaturation of 95°C for 1 min, followed by 5 cycles of (95°C for 20 sec 58°C for 30 sec and 68°C for 3 min), 9 cycles of (95°C for 20 sec, 65°C for 30 sec and 68°C for 3 min), 10 cycles of (95°C for 30 sec, 65°C for 30 sec and 68°C for 3 min+6 sec/cycle) and 1 cycle of 72°C for 10 min. For cell-equivalent RNA inputs, the SLB and SCB were added directly to the RNA (the cell lysis step was omitted) and were reverse-transcribed and pre-amplified in the identical manner to that described for the cell lysates.
For whole-genome gene expression analysis, we used either the Whole-Genome DASL Assay or the Whole-Genome DASL HT Assay, an updated version of the original Whole-Genome DASL Assay
Unless otherwise stated, all data were analyzed in an un-normalized, raw state. All individual samples were assayed a minimum of two times. After scanning, intensity data were imported into GenomeStudio® v2.0 where the data quality was assessed using several assay controls. Detection p-values were computed using several hundred negative controls to determine gene expression detection limits. Assay performance metrics are described further in the
(PDF)
We would like to thank Shujun Luo and Jerry Wang at Illumina, Inc. for helpful discussions.