The authors have declared that no competing interests exist.
Conceived and designed the experiments: OG BW ZT BM. Performed the experiments: OG SS KT KL AK. Analyzed the data: BW FS TK. Contributed reagents/materials/analysis tools: TK. Wrote the paper: OG BW.
The early molecular detection of the dysplasia-carcinoma transition may enhance the strength of diagnosis in the case of colonic biopsies. Our aims were to identify characteristic transcript sets in order to develop diagnostic mRNA expression patterns for objective classification of benign and malignant colorectal diseases and to test the classificatory power of these markers on an independent sample set.
Colorectal cancer (CRC) and adenoma specific transcript sets were identified using HGU133plus2 microarrays and 53 biopsies (22 CRC, 20 adenoma and 11 normal). Ninety-four independent biopsies (27 CRC, 29 adenoma and 38 normal) were analyzed on microarrays for testing the classificatory power of the discriminatory genes. Array real-time PCR validation was done on 68 independent samples (24 CRC, 24 adenoma and 20 normal). A set of 11 transcripts (including CXCL1, CHI3L1 and GREM1) was determined which could correctly discriminate between high-grade dysplastic adenoma and CRC samples by 100% sensitivity and 88.9% specificity. The discriminatory power of the marker set was proved to be high on independent samples in both microarray and RT-PCR analyses. 95.6% of original and 94.1% of cross-validated samples was correctly classified in discriminant analysis.
The identified transcripts could correctly characterize the dysplasia-carcinoma transition in biopsy samples, also on a large independent sample set. These markers can establish the basis of gene expression based diagnostic classification of colorectal cancer. Diagnostic RT-PCR cards can become part of the automated routine procedure.
Colorectal cancer (CRC) is the third most common cancer type and the second leading cause of cancer related mortality in the Western countries
Microarray analyses have already been applied to investigate gene expression changes in many cancer types including CRC
– colorectal carcinogenesis, progression and metastatic development
– different subtypes of CRC with diverse clinicopathological parameters
– limited number of experiments focusing on molecular-based prognosis
The whole genomic microarrays are suitable for high-throughput marker selection, but the high costs and time-consuming execution make their prospective introduction as a diagnostic tool difficult. Furthermore, the evaluation of the huge amount of data collected by microarray analyses requires an extensive bioinformatics with multivariate statistical methods.
However, the newer generation of real-time PCR instruments available with multiplex arrays enables the testing and diagnostic utilization of mRNA expression microarray data. These quantitative array real-time PCRs with 384-well plates give an opportunity for testing the selected marker panels on a large set of independent samples allowing the measuring of the expression of more than hundred genes simultaneously. For the sake of flexibility quantitative RT-PCR with multiple transcript panels are custom-designed
Traditional histology may suffer from sampling bias due to biopsy orientation problems, therefore, critical areas including aberrant crypt foci, dysplastic areas or in situ carcinoma may remain hidden. Molecular based discrimination using mRNA expression can represent the whole sample to avoid this bias and support pathologists in coping with their growing workload of early cancer screening. Furthermore, mRNA expression can reveal functional information beyond microscopy related to the biological behavior, tumor invasion, metastasic spread and therapeutic target expression in colorectal cancer.
In this study, we applied whole genomic microarray analysis in order to identify gene expression profile alterations focusing on the dysplastic adenoma-carcinoma transition. Our aims were to identify characteristic transcript sets in order to develop diagnostic mRNA expression patterns for objective classification of benign and malignant colorectal diseases and to test the classificatory power of these markers on an independent sample set.
After informed consent of untreated patients, colon biopsy samples were taken during endoscopic intervention and stored in RNALater Reagent (Qiagen Inc, Germantown, US) at –80°C. Altogether 147 biopsy specimen (53/original set/and additionally 94 fresh frozen/independent set/samples) were analyzed in our study. Total RNA was extracted and Affymetrix microarray analysis was performed on biopsies of patients with tubulovillous/villous adenomas (n = 29, 13 high-grade dysplastic and 16 with low-grade dysplasia), colorectal adenocarcinoma (n = 27, 14 early and 13 advanced CRC) and of healthy normal controls (n = 38). Fifty three microarrays (11 normal, 20 adenoma, 22 CRC) had been hybridized earlier (original samples set), their data files were used in a previous studies using different comparisons
Group | Original set | Independent set | |
Affymetrix microarrays (GSE4183, GSE10714) | Affymetrix microarrays GSE37364 | Array real-time PCR | |
Adenoma with low-grade dysplasia | 9 | 16 | 13 |
High-grade dysplastic adenoma | 11 | 13 | 11 |
CRC Dukes A–B | 10 | 14 | 10 |
CRC Dukes C–D | 12 | 13 | 10 |
CRC with unknown stage | - | - | 4 |
Healthy Control | 11 | 38 | 20 |
Total patient numbers | 53 | 94 | 68 |
Total RNA was extracted using the RNeasy Mini Kit (Qiagen) according to the manufacturer's instructions. Quantity and quality of the isolated RNA were tested by measuring the absorbance and capillary gelelectrophoresis using the 2100 Bioanalyzer and RNA 6000 Pico Kit (Agilent Inc, Santa Clara, US). Biotinylated cRNA probes were synthesized from 4,82±0,60 µg total RNA and fragmented using the One-Cycle Target Labeling and Control Kit (
Quality control analyses were performed according to the suggestions of the Tumour Analysis Best Practices Working Group
Differentially expressed genes were identified by Significance Analysis of microarrays (SAM) method between different diagnostic groups. The nearest shrunken centroid method (Prediction Analysis for miroarrays – PAM) was applied for sample classification from gene expression data. The pre-processing, data mining and statistical steps were performed using R-environment with Bioconductor libraries. Hierarchical cluster analysis represents on each comparisons of correlation. Logistic regression was applied to analyze dependence of binary diagnostic variables (represented 0 as control, 1 as disease). Discriminant and principal component analysis were also performed. In the discriminant analysis, leave-one out classification was applied for cross-validation.
Commercially available real-time PCR assays were applied for expression measuring of 11 discriminatory transcripts (
Assay ID | Gene Symbol | Gene name | Amplicon length | Position | Intron spanning |
103015 | CA7 | carbonic anhydrase VII | 77 | 416–492 | + |
100950 | IL1B | interleukin 1, beta | 87 | 162–248 | + |
103133 | IL1RN | interleukin 1 receptor antagonist | 76 | 343–418 | + |
103136 | IL8 | interleukin 8 | 92 | 879–970 | − |
103109 | GREM1 | gremlin 1 | 111 | 144–254 | + |
105522 | CXCL1 | chemokine (C-X-C motif) ligand 1 | 105 | 340–444 | − |
103070 | CXCL2 | chemokine (C-X-C motif) ligand 2 | 95 | 431–525 | + |
103045 | COL12A1 | collagen, type XII, alpha 1 | 66 | 2287–2352 | + |
103035 | CHI3L1 | chitinase 3-like 1 | 76 | 433–507 | + |
103210 | SLC7A5 | solute carrier family 7, member 5 | 72 | 1500–1571 | + |
103167 | MMP3 | matrix metallopeptidase 3 | 110 | 1210–1319 | + |
101128 | GAPDH | glyceraldehyde-3-phosphate dehydrogenase | 112 | 30–141 | + |
102065 | B2M | beta-2-microglobulin | 76 | 360–435 | + |
102488 | ACTB | actin, beta | 102 | 1047–1148 | + |
102079 | HPRT1 | hypoxanthine phosphoribosyltransferase 1 | 102 | 218–319 | + |
102119 | RPL13A | ribosomal protein L13a | 124 | 317–440 | + |
104092 | RN18S1 | RNA, 18S ribosomal 1, 18S ribosomal RNA | 73 | 982–154 | − |
102125 | YWHAZ | Top of Form tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta polypeptide | 130 | 453–582 | + |
Relative quantifications of the gene expression were performed and the fold change values were calculated using the ΔΔCT method. The threshold cycle (CT) of the 18S ribosomal RNA endogenous control was used to normalize target gene expression (ΔCT) to correct for experimental variation. Logistic regressions were applied to analyze dependence of binary diagnostic variables (represented 0 as control, 1 as disease) on the ΔCt values from the training set. When P (probability of a patient sample) is diagnosed as “diseased,” then a function X = logit (P) can be defined as follows:
Maximum-likelihood fitting method was used to obtain the (empirical) coefficients {bi} that define the relationship between X and the experimental measurements {ΔCti}. The {bi} values were obtained using MedCalc software program (MedCalc Software). Receiving operating characteristic (ROC) curve analysis was applied to evaluate the discriminatory power of the gene panels
Discriminant and principal component analysis were performed. Discriminant analysis was used primarily in order to predict membership of distinct groups. As a result “Classification results” tables were prepared showing a summary for subjects according to number and percent classified correctly and incorrectly. Leave-one-out classification as cross-validation method was applied. Effective utilization of the discriminant function analysis allowed for a higher percentage of correct estimates from the set of data in the classification table to be possible
Further to this, Principal Components Analysis (PCA) was used as a data dimensionality reduction method which performed a covariance analysis between the determined factors and allowed viewing of multiple datasets into two or three-dimensional figure
Microarray datasets with HGU133 Plus2.0 experiments obtained from colonic biopsy/tissue samples collected by other research groups were downloaded from Gene Expression Omnibus (GEO) database (dataset IDs: GSE8671
Using the original sample group (53 microarrays from 11 normal, 22 CRC and 20 adenoma samples), a set of 11 differentiating transcripts was identified. This set could correctly discriminate not only between the diseased and the normal samples, but could also discriminate between adenoma and CRC samples.
Microarray – original sample set (53) | Microarray – independent sample set (94) | RT-PCR independent sample set (68) | |||||||||
Affymetrix ID | Gene Symbol | Gene name | Log2FC (AD vs. N) | Log2FC (CRC vs. N) | Log2FC (CRC vs. AD) | Log2FC (AD vs. N) | Log2FC (CRC vs. N) | Log2FC (CRC vs. AD) | Log2FC (AD vs. N) | Log2FC (CRC vs. N) | Log2FC (CRC vs. AD) |
207504_at | CA7 | carbonic anhydrase VII | −6.3 | −4.9 | 1.5 | −5.4 | −5.1 | 0.2 | −5.8 | −4.1 | 1.7 |
39402_at | IL1B | interleukin 1, beta | 3.4 | 4.5 | 1.1 | 2.2 | 6.1 | 3.9 | 1.7 | 4.7 | 3.0 |
212657_s_at | IL1RN | interleukin 1 receptor antagonist | 3.3 | 4.7 | 1.4 | 1.7 | 5.1 | 3.4 | 1.0 | 3.3 | 2.3 |
202859_x_at | IL8 | interleukin 8 | 5.2 | 6.6 | 1.4 | 4.1 | 9.0 | 4.8 | 2.2 | 4.4 | 2.2 |
218469_at | GREM1 | gremlin 1 | 0.2 | 4.2 | 4.0 | −0.9 | 3.0 | 3.9 | 2.4 | 4.5 | 2.1 |
204470_at | CXCL1 | chemokine (C-X-C motif) ligand 1 | 5.0 | 5.1 | 0.1 | 4.1 | 6.3 | 2.2 | −0.04 | 2.7 | 2.7 |
209774_x_at | CXCL2 | chemokine (C-X-C motif) ligand 2 | 4.6 | 4.1 | −0.5 | 3.7 | 5.7 | 2.0 | 1.1 | 4.6 | 3.5 |
225664_at | COL12A1 | collagen, type XII, alpha 1 | 2.5 | 3.8 | 1.4 | 1.4 | 3.9 | 2.5 | 1.0 | 3.4 | 2.4 |
209395_at | CHI3L1 | chitinase 3-like 1 | 3.4 | 5.3 | 1.9 | 3.3 | 6.3 | 3.0 | 1.4 | 6.0 | 4.6 |
201195_s_at | SLC7A5 | solute carrier family 7, member 5 | 4.6 | 4.2 | −0.4 | 3.2 | 4.7 | 1.5 | 1.8 | 6.3 | 4.5 |
205828_at | MMP3 | matrix metallopeptidase 3 | 8.2 | 9.7 | 1.5 | 4.0 | 8.4 | 4.4 | 1.8 | 3.2 | 1.4 |
FC = fold change.
Using PCA the marker set shows clear separation of adenoma, normal and CRC cases (
A. Original sample set (53 samples, microarray) B. Independent sample set (94 samples, microarray) C. Independent sample set (68 samples, RT-PCR) D. GSE8671 (64 samples, microarray) E. GSE18105 (111 samples, microarray) a = adenoma, n = normal, crc = colorectal cancer.
Original sample set (n = 53microarrays) | Independent sample set(n = 94 microarrays) | Independent sample set (n = 68 RT-PCR reactions) | ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
11 | 0 | 0 | 11 | 38 | 0 | 0 | 38 | 20 | 0 | 0 | 20 |
|
0 | 20 | 0 | 20 | 2 | 25 | 2 | 29 | 1 | 22 | 1 | 24 | ||
|
1 | 1 | 20 | 22 | 0 | 2 | 25 | 27 | 1 | 0 | 23 | 24 | ||
|
||||||||||||||
|
100 | 0 | 0 | 100 | 100 | 0 | 0 | 100 | 100 | 0 | 0 | 100 | ||
|
0 | 100 | 0 | 100 | 6.9 | 86.2 | 6.9 | 100 | 4.2 | 91.7 | 4.2 | 100 | ||
|
4.5 | 4.5 | 90.9 | 100 | 0 | 7.4 | 92.6 | 100 | 4.2 | 0 | 95.8 | 100 | ||
|
|
|
11 | 0 | 0 | 11 | 37 | 0 | 1 | 38 | 20 | 0 | 0 | 20 |
|
2 | 15 | 3 | 20 | 2 | 25 | 2 | 29 | 1 | 21 | 2 | 24 | ||
|
1 | 3 | 18 | 22 | 1 | 2 | 24 | 27 | 1 | 0 | 23 | 24 | ||
|
||||||||||||||
|
100 | 0 | 0 | 100 | 97.4 | 0 | 2.6 | 100 | 100 | 0 | 0 | 100 | ||
|
10 | 75 | 15 | 100 | 6.9 | 86.2 | 6.9 | 100 | 4.2 | 87.5 | 8.3 | 100 | ||
|
4.5 | 13.6 | 81.8 | 100 | 3.9 | 7.4 | 88.9 | 100 | 4.2 | 0 | 95.8 | 100 |
When paired comparisons were performed using the 11 differentiating markers, ROC analysis was applied. Normal and adenoma samples could be discriminated by 100% specificity and 100% sensitivity. The specificity was 100% and the sensitivity was 95.5% when CRC and normal biopsy samples were separated. Adenoma and CRC samples could be also classified by considerably high specificity and sensitivity (specificity: 100%, sensitivity: 95.5) (
The applied multiple logistic regression equations were applied on the different datasets.
Using the set of the 11 markers resulted in clear differentiation between high-grade dysplastic adenoma (n = 11) and early stage CRC (n = 10) biopsy samples (specificity: 90.9%, sensitivity: 100%) (
A. Microarray B. Real-time PCR C. Heat map of real-time PCR D. Real-time PCR considering the changes in the diagnosis E. Heat map of real time PCR considering the changes in the diagnosis; Adenoma HGD = high-grade dysplastic adenoma, CRC early stage = colorectal cancer early stage.
Principal component analysis of microarray data from independent biopsy samples resulted in distinct clusters of normal, adenoma and CRC cases with small overlaps between the diagnostic groups (
In paired comparison, according to the discriminatory set with 11 classifiers, the independent CRC and normal samples could be clearly separated. The sensitivity was 100%, the specificity was 100%. Using the discriminatory panel, independent adenoma and healthy samples could be distinguished with 100% specificity and 96.6% sensitivity. The marker set was suitable for classification of the independent benign and malignant colon samples with 89.7% specificity and 100% sensitivity (
The independent high-grade dysplastic adenoma (n = 13) and early stage CRC (n = 14) biopsy samples could be discriminated by 92.3% specificity and 100% sensitivity. Youden indices were calculated in order to determinate discriminatory strength. These values vary between 0.89 and 1.
Marker panel validation was performed on microarray datasets downloaded from Gene Expression Omnibus database. The microarray dataset GSE8671
By the same classifiers, 94 CRC and 17 healthy tissue samples from the GSE18105 study
The array RT-PCR measurements for selected transcript panels were performed on independent biopsy specimens. According to the lowest standard deviation of ΔCT values, 18S ribosomal RNA was chosen as a reference among the seven housekeeping genes placed on the array real-time PCR plate.
PCA figure shows that normal, adenoma and CRC biopsy samples are classified into three distinct groups (
Discriminant analysis of 11 markers on independent RT-PCR samples showed correct classification for 95.6% of the original grouped cases, and 94.1% of the cross-validated cases (
When only 2 sample groups were compared, discriminatory power of the gene panel is also proved to be considerably high during the ROC curve analysis of CRC and normal samples (sensitivity: 100%, specificity: 100%). The adenoma and healthy samples could be clearly separated by 95.8% sensitivity and 95.0% specificity values. In case of adenoma vs. CRC comparison, the ROC curve analysis showed separation with 95.8% sensitivity and specificity.
The set of 11 classifiers could classify the 24 high-grade dysplastic adenoma and the 24 early CRC (stage Dukes A or B) samples analyzed on microarrays by 83.3% specificity and 100% sensitivity (
The hierarchical cluster diagram of the real-time PCR samples represents that all the 10 CRC samples were correctly classified, and 3 of the 11 adenoma samples were misclustered (
In this study a characteristic transcript set was determined which is specific for the colorectal dysplasia-carcinoma transition using whole genomic microarray in 53 biopsy samples. In order to test the differentiation power of the discriminatory gene panel, an additional 94 microarrays with independent colonic biopsy specimen and microarray datasets downloaded from the Gene Expression Omnibus were also analyzed. With further validation conducted by array real-time PCR cards that contained the characteristic transcript panel. The identified set of 11 transcripts can be used for separation of CRC, adenoma and normal biopsy samples, moreover it is suitable for discrimination between high-grade dysplastic adenoma and early stage CRC cases by high specificity and sensitivity.
The use of whole genomic microarray analyses represents an important tool for high-throughput gene expression screening, but equipment and reagent costs do not qualify it as for a cost effective diagnostic tool. Therefore quantitative array real-time PCR cards with assays for selected set of classifiers offer a more viable alternative for diagnostic application with lower costs and automation possibility for the whole process from RNA isolation to the RT-PCR analysis
The current method of determining colorectal cancers and adenomas is histological analysis. Colon biopsy specimens are evaluated from 4–5 pieces of small sections of 3–5 µm thick taken from different areas of the colon. However critical areas may remain hidden in the uncut specimen block or due to inadequate orientation including aberrant crypt foci in hyperplastic polyps, in situ carcinoma in adenomas, dysplastic areas and carcinomas in long-time IBD specimens
Further to this, pathologists recently have to face growing workload due to the increasing demand on cancer screening biopsies, molecular testing for target therapy and the concomitant sub-specialization. Therefore, an alternative but still reliable method for identifying diseased or negative specimens could be of great importance. The automated evaluation of colon biopsy specimens by mRNA expression profiling could be a valid approach since much of the methodology, preparation and the analysis procedure are already available.
Furthermore, the mRNA expression analysis gives us an insight into altered cellular functions beyond the microscopic level. This information might be related to the biological behaviour of tumors and/or the expression of therapeutic targets, e.g. growth factor receptors. Also the expression of metastasis related genes and those involved in tumor invasiveness may be identified.
The set of 11 classifiers determined in our study showed considerably high discriminatory power on the microarray datafiles of previous studies in CRC vs. normal and in adenoma vs. normal comparisons.
Among the 11 discriminatory transcripts, except COL12A1, ten (namely IL8, MMP3, IL1B, CHI3L1, GREM1, IL1RN, CXCL1, CXCL2, CA7 and SLC7A5) are thought to be associated with colorectal carcinogenesis and progression. In accordance with our findings, 7 of them, such as IL8, CHI3L1, CXCL1, CXCL2, MMP3, SLC7A5 and CA7, were found to be differentially expressed in CRC compared to normal tissue in previous microarray studies
Interleukin 8 (IL8) promotes cell proliferation and migration of human colon carcinoma cells through metalloproteinase-cleavage proHB-EGF
In summary, this study identified a set of 11 discriminatory transcripts which could correctly classify not just normal, adenoma and CRC biopsies, but high-grade dysplastic adenoma and early stage CRC samples, even if using a large independent sample set. Although 10 of the 11 discriminatory genes are already known to be associated with CRC, these markers as a combined discriminative set are firstly applied in this study. The identified set of 11 markers was proved to be a highly specific and sensitive discriminator of the colorectal dysplasia-carcinoma transition which is of great clinical importance regarding the early diagnosis of CRC. These markers can establish the basis of gene expression based diagnostic classification of benign and malignant colorectal diseases and of development of diagnostic real-time PCR cards, furthermore they are to be utilized for prospective biopsy screening both at mRNA and protein levels.
(DOC)
We would like to thank Tim Allen MSc. from Cranfield University, UK for reviewing the manuscript.