The authors have declared that no competing interests exist.
Conceived and designed the experiments: KK. Performed the experiments: YK KT. Analyzed the data: KK YK SO FI JU KN TO TK. Contributed reagents/materials/analysis tools: YK FI JU KN TO TK KT. Wrote the manuscript: KK YK. Statistical analysis: KK. Certification and conceptual assistance of statistical analysis: SO.
The detection of rare mutants using next generation sequencing has considerable potential for diagnostic applications. Detecting circulating tumor DNA is the foremost application of this approach. The major obstacle to its use is the high read error rate of next-generation sequencers. Rather than increasing the accuracy of final sequences, we detected rare mutations using a semiconductor sequencer and a set of anomaly detection criteria based on a statistical model of the read error rate at each error position. Statistical models were deduced from sequence data from normal samples. We detected epidermal growth factor receptor (
For some molecular targeted drugs against cancer, the examination of genomic changes in target genes has become a diagnostic routine and is indispensable for treatment decisions. For example, the strong effects of epidermal growth factor receptor tyrosine kinase inhibitors (EGFR-TKIs; i.e., gefitinib and erlotinib) on non-small-cell lung cancer (NSCLC) are correlated with activating somatic mutations in
Cell-free DNA in the blood consists of DNA derived from cancer tissues and has been studied for non-invasive diagnostic procedures [
Because BEAMing and next-generation sequencers, i.e., massively parallel sequencers, use the same or a very similar template preparation technique, it is possible to apply next-generation sequencers for the same purpose. There have been several studies on the deep sequencing of cell-free DNA [
In this report, we established a method of detecting
Deep sequencing of a PCR-amplified fragment containing a mutation site can be conducted to detect and quantitate mutated alleles among the vast amounts of normal alleles derived from host tissues. The major problem associated with this approach is the frequency of errors introduced during sequencing and PCR amplification. The key issue here is the setting and accurate evaluation of detection limits. When the frequency of a base change at a target locus is higher than a predetermined read error rate (RER), we may judge the change to be due to the presence of a mutant sequence. That is, anomalies that fall significantly outside of the RER distribution are regarded as mutations. The RER is defined as the error rate calculated from final sequence data, including errors in both the sequencing and PCR steps. In anomaly detection [
If read errors occur under a probability distribution, the number of reads required to achieve a certain detection limit can be estimated.
a, Relationship between the read error rate, read depth, and detection limit for mutations when the significance level is p=2x10-5. Horizontal axis, read depth; vertical axis, detection limit (%). From top to bottom, each line indicates a read error rate (RER) of 1%, 0.2%, 0.05%, or 0.01%. b, Three-dimensional representation of substitution RER. x-axis, base positions of EGFR exons 19–21. From left to right, the arrowheads indicate the positions of T790M, L858R, and L861Q. y-axis, 48 DNA samples from normal individuals. From front to back, conversions to A (green), C (yellow), G (magenta), or T (blue) are aligned for each sample. z-axis, RER (%). c, Three-dimensional representation of the insertion/deletion error. x-axis, base positions of EGFR exons 19–21. The bar indicates the position of the exon 19 deletion. y-axis, 48 DNA samples from normal individuals. Blue, plasma DNA; light blue, WBC DNA (large amount); dark blue, WBC DNA (small amount). z-axis, RER (%). d, Distribution of the RER. White column, substitution error; gray column, insertion/deletion error. Horizontal axis, range of RER (%); vertical axis, incidence (%).
For EGFR-TKI treatment, an activating
We determined the RERs in a 169 base region around the target loci consisting by performing deep sequencing of DNA samples from normal individuals. We used an Ion Torrent PGM [
Due to high insertion/deletion read errors, we employed a specific method to detect the exon 19 deletion mutations. We prepared eight template exon 19 sequences with representative deletions and screened the deletion sequences by matching them with the template sequences. This method was quite effective for screening out read errors; no sequences with deletion read errors were found among the 48 samples tested.
We then examined statistical models of read error. In a Poisson distribution model, the average and variance of the number of incidences are expected to be the same and are determined by the intensity parameter
a, Relationship between the average and variance of the substitution error presented as the number per 100,000 reads. Horizontal axis, average; vertical axis, variance. The red line indicates where the average and variance are equal. b, Difference between thresholds calculated according to a negative binomial distribution and a Poisson distribution. The threshold is the minimum number of base changes in 100,000 reads meeting the level of statistical significance (p-0.01). Horizontal axis, variance/average ratio of the substitution read error; vertical axis, difference between thresholds. The types of substitutions whose variance/average ratio ranged from 1 to 2 are plotted. c, Accuracy of quantitation. Each data point represents the average of three assays. Horizontal axis, fraction of mutant alleles in artificial products; vertical axis, fraction of mutant alleles estimated from deep sequencing. d, Reproducibility of quantitation. Horizontal axis, base change rate in the first trial; vertical axis, base change rate in the second trial.
When the average read error in 100,000 reads was less than 1, a Poisson distribution with λ set to 1 was applied (169 types of substitutions).
When the average was greater than 1 and the variance/average ratio of the read error was less than 1.2, a Poisson distribution was applied (15 types of substitutions).
When the average was greater than 1 and the variance/average ratio of the read error was greater than 1.2, a negative binomial distribution was applied (323 types of substitutions).
The exon 19 deletion and L858R belonged to the first category, while the L861Q and T790M mutation sites belonged to the second and the third categories, respectively. The detection limits for the exon 19 deletion and the L858R, L861Q, and T790M substitution mutations at a significance level of p=2x10-5 were less than 0.01% and less than 0.01%, 0.01%, and 0.05%, respectively. In the following analysis, we used p=2x10-5 as the significance threshold for each single detection, without considering a multiplicity correction, expecting one false positive in 50,000 samples.
The outline of the method is 1) amplification of
First, we examined the method’s quantification ability. We prepared test samples including various fractions of PCR products of mutated
We further evaluated our method using lung cancer biopsy specimens, sampling plasma DNA and the primary lesion simultaneously as part of a prospective study. The results for the samples from 22 patients showed 86% concordance (95% confidence interval, 66 - 95), 78% (44 - 93) sensitivity, and 92% (66 - 98) specificity, setting the tissue biopsy as the standard. These results are promising with respect to the development of a diagnostic tool to complement lung cancer biopsy.
We then analyzed a total of 155 samples: 144 samples from plasma, eight from cerebrospinal fluid, and one each from urine, pleural effusion, and bronchial alveolar lavage. As for plasma samples, two or more samples were obtained from 32 patients at different time points of the disease courses. All of the obtained data are shown in
A considerable number of samples were collected from the same patient at different time points in the disease course. Temporal changes of
Each dot represents a time point of sampling. The diagram is not precise representation of time scale, and only the order of dots is valid information. Figures represent
Data were obtained both before and after acquiring EGFR-TKI resistance in seven cases. After acquiring resistance, the activation of mutation level was increased in five patients (218, 226, 259, 61, 66), decreased in one patient (44), and increased with delay in another patient (178). Increase of activation of mutations may correlate with disease progression. Despite the clear correlation between T790M and the EGFR-TKI-resistance status in the above validation study, dynamics of T790M during the disease course was not as clear as that of activation of mutations; T790M often appeared before acquiring resistance.
Three patients are described in more detail. Patient 226 was treated with gefitinib as first line chemotherapy. The gefitinib treatment was stopped several times due to adverse effects. A radiological response (partial response, PR) was observed from month 1 to month 9, and disease progression was observed in month 10. Prior to gefitinib treatment, the fraction of the mutant allele was very high (>50%), but after only one week of this treatment, the fraction of the mutant allele decreased to 0.3%, prior to any radiological changes (Figure S3a in
We explored the possibility of identifying substitution mutations in the entire target
a, Distribution of the number of different types of substitutions judged as mutations per sample. Horizontal axis, number of the types of substitutions; vertical axis, number of samples. b, Distribution of the number of samples with a substitution type judged as a mutation. Horizontal axis, the number of samples with a substitution type judged as a mutation; vertical axis, number of the types of substitutions.
Rare mutation detection of target loci through the deep sequencing of plasma cell-free DNA has a comparable sensitivity to BEAMing. The specificity is also acceptable because the
However, it is difficult to extend mutation detection to a larger region. The incidence of false positives is not acceptable for diagnostic applications. Parameter estimation with increased numbers of normal samples and/or more conservative estimation methods, such as Bayesian inference, might decrease false positives. We used mutation-free DNA from normal individuals for the survey of read error, but mutation detection was performed with plasma DNA from lung cancer patients. A possible cause of the inadequate thresholds may be the difference in DNA quality. The recent discovery of artifactual mutations introduced during experimental processes [
Our procedure is optimized for our objectives and social environment, but there is room for technical improvement. In addition to the paired-end method [
In addition to being applied for the non-invasive diagnosis of
Biopsies of advanced cases and repeated biopsies are technically demanding, and replacement with a non-invasive method would be beneficial. In this context, monitoring T790M with our method would have substantial benefits for patient management. For example, detecting the T790M mutation in blood samples would be useful for patient selection for treatment with new EGFR-TKIs for lung cancers that are resistant to gefitinib and erlotinib [
Recent two studies suggest other possibilities of
Patients with activating EGFR mutations in tumor tissues were recruited at Osaka Medical Center for Cancer and Cardiovascular Diseases. Pleural fluid, cerebrospinal fluid and/or urine samples were collected from some patients. In all of the patients, activating
Plasma was prepared via centrifugation of 4-5 ml of EDTA-treated blood at 800
To amplify target regions of the
Sequencing template preparation (emulsion PCR and beads-enrichment) from sequencing libraries was carried out using an Ion OneTouch Template Kit (Life Technologies) and Ion OneTouch system (Ion OneTouch Instrument and Ion OneTouch ES, Life Technologies) according to manufacturer’s protocol. Prepared templates were sequenced using Ion Sequencing Kit v2 and the Personal Genome Machine (Life Technologies). Number of nucleotide flows during sequencing was set to 200 (50 cycles). Torrent Suite 2.2 (Life Technologies) was used for converting raw signals into base calls, and extracting FASTQ files of sequencing reads. Read depth for one assay mostly exceeded 100,000. Sequencing data were deposited in DDBJ Sequence Read Archive (accession number: DRA001029).
Reads in FASTQ files were divided using 5-nt indexes for individual assignment using in-house perl script. Short reads (<70 bases) were discarded. Remaining reads were aligned to target sequences (exon 19, 20 and 21 of
Using samtools (version 0.1.18) [
For each nucleotide substitution pattern, parameters of Poisson and negative binomial distribution were estimated using the method of moments with data from 48 normal DNA samples. Parameters
This study was approved by the ethic committee of Osaka Medical Center for Cancer and Cardiovascular Diseases. Written informed consent was obtained from all patients recruited in this study.
Figure S1. Relationship between the averages and variances of the insertion/deletion errors. Figure S2. Outline of the method. Figure S3. Dynamics of mutant alleles in plasma cell-free DNA during EGFR-TKI treatment.
(PDF)
(XLS)
(XLS)
(XLS)
(XLS)
(XLS)
The authors thank Ms. Shiho Sasaki for excellent technical assistance.