The authors have declared that no competing interests exist.
Conceived and designed the experiments: CW UA BM CL. Analyzed the data: CW PVL JT PR. Contributed reagents/materials/analysis tools: SG TB RH TB UA. Wrote the paper: CW PR BM. Critical revision: SG PVL TB JT RH TB UA PR BM CL.
A number of gene variants have been associated with an increased risk of developing glioma. We hypothesized that the reported risk variants may be associated with tumor genomic instability. To explore potential correlations between germline risk variants and somatic genetic events, we analyzed matched tumor and blood samples from 95 glioma patients by means of SNP genotyping. The generated genotype data was used to calculate genome-wide allele-specific copy number profiles of the tumor samples. We compared the copy number profiles across samples and found two EGFR gene variants (rs17172430 and rs11979158) that were associated with homozygous deletion at the CDKN2A/B locus. One of the EGFR variants (rs17172430) was also associated with loss of heterozygosity at the EGFR locus. Our findings were confirmed in a separate dataset consisting of matched blood and tumor samples from 300 glioblastoma patients, compiled from publically available TCGA data. These results imply there is a functional effect of germline EGFR variants on tumor progression.
Genome-wide association studies (GWAS) have identified common genetic variants that are likely to be involved in the etiology of glioma. There are three published GWAS to date that have identified eight different loci associated with glioma risk
The Cancer Genome Atlas (TCGA) has published a comprehensive genomic analysis of 206 glioblastoma cases
Many of the loci harboring the risk variants (
Discovery (UMU) | Validation (TCGA) | |||||||||||||||||||
Risk variant | Chr | Position | Gene | Major allele | Risk allele | Ref. | Surrogate | Position | LD (r∧2) | Major allele | Risk allele | n (major) | n (rare+hz) | Surrogate | Position | LD (r∧2) | Major allele | Risk allele | n (major) | n (rare+hz) |
rs2736100 | 5 | 1339516 | TERT | A | C |
|
24 | 57 | 46 | 239 | ||||||||||
rs2252586 | 7 | 54946418 | EGFR | G | A |
|
rs6945082 | 54925952 | 0.883 | G | A | 38 | 43 | 116 | 169 | |||||
rs6969537 | 7 | 55049912 | EGFR | G | G |
|
55 | 26 | 222 | 63 | ||||||||||
rs17172430 | 7 | 55090144 | EGFR | G | G |
|
rs1015793 | 55081810 | 0.742 | A | A | 60 | 21 | 236 | 49 | |||||
rs11979158 | 7 | 55126843 | EGFR | A | A |
|
rs10245472 | 55114972 | 1.000 | G | G | 57 | 24 | 209 | 76 | |||||
rs4947979 | 7 | 55163119 | EGFR | A | A |
|
44 | 37 | 184 | 101 | ||||||||||
rs4295627 | 8 | 130754639 | CCDC26 | A | C |
|
rs6470745 | 130711103 | 1.000 | A | G | 43 | 38 | 198 | 87 | |||||
rs1412829 | 9 | 22033926 | CDKN2B | A | G |
|
rs634537 | 22022152 | 1.000 | A | C | 17 | 64 | 89 | 196 | |||||
rs4977756 | 9 | 22058652 | CDKN2A-CDKN2B | A | G |
|
18 | 63 | 92 | 193 | ||||||||||
rs498872 | 11 | 117982577 | PHLDB1 | G | A |
|
31 | 50 | 131 | 154 | ||||||||||
rs1476278 | 17 | 35089769 | ERBB2 | A | G |
|
rs903502 | 35083130 | 1.000 | A | G | 33 | 48 | rs12150298 | 35088067 | 1.000 | G | A | 122 | 163 |
rs2952155 | 17 | 35115244 | ERBB2 | G | A |
|
rs9635726 | 35273667 | 0.678 | G | A | 44 | 37 | rs9635726 | 35273667 | 0.678 | G | A | 186 | 99 |
rs6010620 | 20 | 61780283 | RTEL1 | G | G | 52 | 29 | 201 | 84 |
We hypothesized that reported risk variants are associated with genomic instability. To test this hypothesis, we analyzed matched blood and tumor samples from 95 glioma patients by means of SNP genotyping. Based on the SNP genotyping data, we calculated genome-wide allele-specific copy number in the tumor samples. This enabled us to explore possible correlations between germline risk genotypes and frequencies of somatic aberrations.
This study was based on samples collected from glioma patients diagnosed at Umeå University Hospital, between 1995 and 2008. A total of 197 patients were diagnosed during this period. Ninety-five (95) patients from whom matched blood and tumor samples were available were included in the study. Diagnoses were confirmed by pathology review. This sample set is referred to as the UMU set, and its characteristics are listed in
All samples | Solved by ASCAT | |
Total number of patients | 95 | 81 |
Gender (male/female) | 59/36 | 52/29 |
Age at dianosis median (yrs) | 56 (15–80) | 55 (15–80) |
Male | 57 | 57 |
Female | 54 | 53 |
Histological subtype distribution | ||
GBM | 63 | 55 |
Non GBM | 32 | 26 |
Collection of blood samples, brain tumor tissues and clinico-pathological information from patients was undertaken with written informed consent and the study was approved by our ethical board, in accordance with the Umeå University Hospital guidelines.
DNA was extracted from EDTA-venous blood samples using FlexiGene DNA Kit (QIAGEN GmbH, Hilden, Germany) and brain tumor tissues using QIAmp DNA Mini Kit (QIAGEN GmbH, Hilden, Germany) methodologies. Genotyping was conducted by the SNP&SEQ Technology Platform, Uppsala, Sweden (
The validation dataset for this study was compiled from publically available TCGA data. Illumina idat-files from matched tumor and blood samples were downloaded (13 December 2011) for 334 GBM patients analyzed on the Illumina HumanHap550 array. Samples from 32 patients (all from the same sample plate) were excluded due to a large proportion of failed probes (>5%). Furthermore, we excluded two additional patients due to probable sample mix-ups (the blood raw data profiles appeared very similar to typical tumor samples). In total we found matched tumor and blood samples from 300 GBM patients eligible to use as a validation set.
Generated intensity data was imported into GenomeStudio software. The GenCall Score cutoff was set to 0.15. Log R ratio (LRR) and B allele frequency (BAF) data from each sample and probe was subsequently exported. To avoid downstream difficulties with segmentation, we removed LRR and BAF data from W-probes with LRR<−2 and replaced them with missing value. This was done individually for each sample in the UMU-dataset. The HumanHap550 array does not contain W-probes, hence this does not apply to the TCGA data. Lastly, we adjusted for GC-waves in both datasets
We used the ASCAT-algorithm
Whole genome ASCAT-profiles from two samples in the UMU dataset; one diploid (
Each individual probe was assigned to at least one of eight types of genomic events (
ASCAT calculated tumor ploidy for 81 UMU samples (
Genomic events | Definition |
Normal | nA = 1 AND nB = 1 |
Loss | nTot <2 |
Increased copy number | nTot >2 |
Loss of heterozygosity (LOH) | nA = 0 OR nB = 0 |
Copy number neutral events (CNNE) | nA≠1 AND nB≠1 AND nTot = 2 |
Homozygous deletion (HD) | nTot = 0 |
Amplification | nTot > = 8 |
Simultaneous LOH and increased copy number | (nA = 0 OR nB = 0) AND nTot >2 |
nTot = nA + nB, where nA and nB represent calculated copy numbers for the separate alleles at a given probe.
In the discovery phase, performed on UMU-data, we used two complementary approaches to study correlation between risk variants and the specified genomic events; one global and one focused on selected genes of interest. The global approach was designed to explore correlations anywhere in the genome, with the prerequisite that the event frequency was relatively large. The genes-of-interest approach was designed to explore a set of predefined genes, regardless of event frequency. All steps of the analyses were performed in R (
For each individual probe, we calculated the frequency of a given event in all samples. We then calculated an event frequency cutoff, to locate recurring events in our sample series. The event frequency cutoff was calculated based on permutations of genomic positions. After the genomic positions had been permuted, individually for each sample, the event frequency at each probe was determined. Based on the generated random frequency data, an intermediate event frequency cutoff was set using a false positive rate of 0.5% (which we estimate correspond to a true discovery rate of 90 to 99%). The permutation of genomic positions was repeated a total of 100 times, and we used the mean value of the intermediate event frequency cutoffs as our final cutoff to call recurrent events. X-chromosome data was left out of the permutation procedure.
We split the dataset into two groups based on a given risk variant (
The p-value cutoff was determined through permutation. In each permutation, we randomly assigned all samples to one of the two groups (while maintaining constant group sizes), and thereafter calculated p-values across the genome, as described above. The permutation was repeated 100 times, and for each run the lowest recorded p-value was stored. The p-value cutoff was set to the 95th percentile of the stored p-values from the permutation procedure. This procedure is similar to that described in Lystig et al
This approach was repeated for each event of interest, as well as for all risk variants.
The genes of interest (
Gene | Locus |
CDKN2C | chr1:51206954-51212894 |
MDM4 | chr1:202752129-202793871 |
AKT3 | chr1:241729643-242073207 |
IDH1 | chr2:208809197-208828051 |
PIK3CA | chr3:180349004-180435191 |
PDGFRA | chr4:54790020-54859169 |
TERT | chr5:1306286-1348162 |
PIK3R1 | chr5:67547339-67633405 |
PARK2 | chr6:161688579-163068824 |
EGFR | chr7:55054218-55242525 |
CDK6 | chr7:92072170-92301167 |
MET | chr7:116099694-116225676 |
MYC | chr8:128817496-128822862 |
CCDC26 | chr8:130433119-130761667 |
CDKN2A | chr9:21957750-21965132 |
CDKN2B | chr9:21992901-21999312 |
PTEN | chr10:89613174-89718512 |
PHLDB1 | chr11:117982422-118033958 |
CCND2 | chr12:4253162-4284783 |
CDK4 | chr12:56427776-56432497 |
MDM2 | chr12:67488237-67525479 |
RB1 | chr13:47775883-47954027 |
IDH2 | chr15:88428215-88446712 |
TP53 | chr17:7512444-7531593 |
NF1 | chr17:26446070-26728821 |
ERBB2 | chr17:35109779-35138441 |
RTEL1 | chr20:61759606-61798050 |
For each gene of interest, each sample was classified as positive or negative for a given event. Samples were classified as positive if >50% of the probes within the locus of the gene were positive, and vice versa. We divided the samples into two groups, based on the same principle used in the global approach. Subsequently we applied a one sided Fisher's Exact test to the contingency table of samples with and without event, to test the hypothesis that the group containing the risk allele displayed a higher event frequency than the other group. Gene/event combinations with p<0.05 in the UMU dataset were shortlisted for validation in the TCGA data set.
All events found in the discovery phase were subsequently validated in the TCGA dataset. We used the same method as for the genes-of-interest approach. I.e., for each combination of event and region/gene of interest that was selected for validation, we classified all TCGA samples as either positive or negative for the given event within the region/gene of interest. To be classified as positive, >50% of all probes within the region/gene of interest had to be positive. We thereafter split the dataset in two and performed a one-sided Fisher's Exact test, as previously described.
We inferred copy number profiles of glioma tumor cells from SNP array data, accounting for non-aberrant cell admixture and tumor aneuploidy, using the ASCAT algorithm
We calculated the frequency of the defined somatic events over the whole genome in the UMU dataset, and determined regions where each event was significantly recurring (
Frequency (%) is represented on the y-axis and genomic position on the x-axis. The permutation derived event frequency cutoff (used to establish regions with significantly recurring events among the UMU samples) is illustrated by the red line.
The samples are split into two groups, based on their genotype at the EGFR variant rs1015793. HD frequency among samples homozygous for the major allele is plotted on the upper half of the upper panel (grey bars), and the HD frequency among samples homozygous for the rare allele plus heterozygous samples is plotted on the lower half of the upper panel (grey bars). The frequency difference between the groups is illustrated with overlaying red and green bars. The event frequency cutoff is illustrated by the horizontal red lines. The dark grey, vertical lines represent CDKN2A and CDKN2B. The lower panel illustrates the p-values (−log10 transformed) at each genomic site, with the horizontal red line illustrating the permutation derived p-value cutoff (0.0085).
UMU | TCGA | ||||||||||||||||
Risk variant | Region | Cytoband | Gene | Event | Variant | n (major) |
n (major) event |
n(rare+hz) |
n(rare+hz) event |
P | Variant | n (major) | n (major) event | n(rare+hz) | n(rare+hz) event | P | Approach |
rs17172430 | chr7:55054218-55242525 | 7p11.2 | EGFR | LOH | rs1015793 | 60 (41/19) | 19 (15/4) | 21 (14/7) | 2 (2/0) | 0.0385 | rs17172430 | 236 | 57 | 49 | 6 | 0.0455 | GOI |
rs11979158 | chr9:21957750-21965132 | 9p21.3 | CDKN2A | HD | rs10245472 | 57 (39/18) | 33 (26/7) | 24 (16/8) | 8 (7/1) | 0.0374 | rs11979158 | 209 | 51 | 76 | 10 | 0.0267 | GOI |
rs11979158 | chr9:21992901-21999312 | 9p21.3 | CDKN2B | HD | rs10245472 | 57 (39/18) | 32 (25/7) | 24 (16/8) | 7 (6/1) | 0.0233 | rs11979158 | 209 | 122 | 76 | 32 | 0.0107 | GOI |
rs17172430 | chr9:21992901-21999312 | 9p21.3 | CDKN2B | HD | rs1015793 | 60 (41/19) | 34 (27/7) | 21 (14/7) | 5 (4/1) | 0.0088 | rs17172430 | 236 | 134 | 49 | 20 | 0.0300 | GOI |
rs17172430 | chr9:21961989-21978896 | 9p21.3 | MTAP, CDKN2A | HD | rs1015793 | 60 (41/19) | 35 (28/7) | 21 (14/7) | 5 (4/1) | 0.0062 | rs17172430 | 236 | 135 | 49 | 20 | 0.0264 | global |
rs17172430 | chr9:22010493-22055620 | 9p21.3 | MTAP, CDKN2BAS | HD | rs1015793 | 60 (41/19) | 33 (27/6) | 21 (14/7) | 4 (3/1) | 0.0040 | rs17172430 | 236 | 132 | 49 | 19 | 0.0210 | global |
total number of samples (glioblastoma samples/non-glioblastoma samples).
By the complementary genes-of-interest approach, we found 35 events that were significantly more frequent in the risk group in the discovery phase. Four of these proved significant also in the validation phase (
Among the UMU samples, 60 samples were homozygous for the risk allele at the EGFR variant rs1015793 (which was used as a surrogate for the risk variant rs17172430). Of these, 19 displayed LOH at the EGFR locus, and 35 displayed HD at the CDKN2A locus. Fourteen of the 60 displayed both HD at CDKN2A/B and LOH at the EGFR locus (
Venn diagram of patients that were homozygous for the risk allele at rs1015793, and displayed aberrations at EGFR and CDKN2A/B.
The results are summarized in
Molecular profiling of glioma has identified several important pathways that characterize the different histopathological types of glioma. Increased insight into glioma biology is important to help understand its etiology and initiation processes, which in turn may enable development of preventive and therapeutic strategies. In the present study, we have identified correlations between germline EGFR gene variants (rs17172430 and rs11979158) and loss of heterozygosity (LOH) at the EGFR locus as well as homozygous deletion at the CDKN2A/B locus. This is indicative of a functional effect of a germline variant on tumor progression. Discoveries in genetic etiology have been important for the development of novel treatments in other cancers, such as PARP-2 inhibitors in breast cancer patients carrying mutations in the BRCA1 gene
Early studies showed two major pathways of glioma progression, characterized by EGFR amplification
We have investigated 13 germline gene variants, annotating 8 genes associated with glioma susceptibility (
In this study, we investigated two variants annotating the CDKN2A/B genes (rs1412829 and rs4977756). We found no significant associations between these variants and somatic copy number alterations anywhere in the genome. This is similar to the findings of another recent study, where the same variants were investigated in relation to copy number alterations of the CDKN2A/B genes, and no associations were found
We report a correlation between a risk variant in EGFR intron 1 (rs17172430) and LOH at the EGFR locus. This is in contrast to the report by Sanson et al
Schematic diagram of the EGFR gene structure, marking all gene variants included in the study and their internal LD structure. Pairs of risk variants (red) and surrogate markers used in the UMU data (blue) are marked with dashed rectangles.
EGFR is a large gene (188 kb) with a complex haplotype structure. Intron 1 alone, harboring both rs17172430 and rs11979158, consists of >122 000 bases. Our findings imply that variation in this region has a functional role. However, this needs to be investigated further, for example by targeted re-sequencing of the region, to explore the possibility of germline functional mutations in linkage disequilibrium with the identified risk genotype.
The other main findings of our study were correlations between the EGFR risk genotypes (rs17172430 and rs11979158) and homozygous deletions of CDKN2A/B. Genetic events in CDKN2A/B and EGFR often co-occur, thus it is difficult to disentangle whether these are independent events, or whether the CDKN2A/B events are secondary to the EGFR event.
Several associations from the discovery phase were not significant in the validation set. The UMU sample set (used for discovery) contained all different subsets of glioma, whereas the TCGA sample set (used for validation) contained only glioblastoma. Because of this, the validation procedure would have discarded any associations between germline gene variants and somatic copy number aberrations that were mainly evident among other lower graded glioma. However, as there appears to be no such events found in the discovery phase (
Estimating genome-wide copy numbers from SNP-chip genotype data from tumor samples is complicated by the facts that tumor cells may not be diploid and that samples many times contain DNA from both tumor and stromal cells. The ASCAT algorithm solves both these inherent difficulties and allows for genome-wide allele-specific analysis of copy number from tumor samples
In conclusion, we have found correlations between EGFR gene variants and somatic aberrations of both EGFR and CDKN2A/B. We believe these variants may have a driving effect on glioma progression, and thus provide a novel lead to further understanding of genotype-phenotype correlations in glioma etiology. Additional studies of the direct functional role need to be conducted to elucidate the molecular mechanisms underlying the identified association between germline gene variants and somatic aberrations.
(XLSX)
(XLSX)
The results published here are in part based upon data generated by The Cancer Genome Atlas established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at “