Genome-wide association studies (GWAS) have identified multiple single nucleotide polymorphisms (SNPs) associated with prostate cancer risk. However, whether these associations can be consistently replicated, vary with disease aggressiveness (tumor stage and grade) and/or interact with non-genetic potential risk factors or other SNPs is unknown. We therefore genotyped 39 SNPs from regions identified by several prostate cancer GWAS in 10,501 prostate cancer cases and 10,831 controls from the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We replicated 36 out of 39 SNPs (P-values ranging from 0.01 to 10−28). Two SNPs located near KLK3 associated with PSA levels showed differential association with Gleason grade (rs2735839, P = 0.0001 and rs266849, P = 0.0004; case-only test), where the alleles associated with decreasing PSA levels were inversely associated with low-grade (as defined by Gleason grade <8) tumors but positively associated with high-grade tumors. No other SNP showed differential associations according to disease stage or grade. We observed no effect modification by SNP for association with age at diagnosis, family history of prostate cancer, diabetes, BMI, height, smoking or alcohol intake. Moreover, we found no evidence of pair-wise SNP-SNP interactions. While these SNPs represent new independent risk factors for prostate cancer, we saw little evidence for effect modification by other SNPs or by the environmental factors examined.
Citation: Lindstrom S, Schumacher F, Siddiq A, Travis RC, Campa D, et al. (2011) Characterizing Associations and SNP-Environment Interactions for GWAS-Identified Prostate Cancer Risk Markers—Results from BPC3. PLoS ONE 6(2): e17142. doi:10.1371/journal.pone.0017142
Editor: Marie-Pierre Dubé, Université de Montreal, Canada
Received: September 15, 2010; Accepted: January 21, 2011; Published: February 24, 2011
This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
Funding: This work was supported by the US National Institutes of Health, National Cancer Institute [cooperative agreements U01-CA98233-07 to David J. Hunter, U01-CA98710-06 to Michael J. Thun, U01-CA98216-06 to Elio Riboli and Rudolf Kaaks, and U01-CA98758-07 to Brian E. Henderson, and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Prostate cancer is the most common non-skin cancer among men in industrialized countries, but beyond age, ethnicity and family history, very little is known about its etiology. Observed familial aggregation together with evidence from both twin and epidemiological studies demonstrate a key role for inherited genetic variants .
Genome-wide association studies (GWAS) conducted within the last few years have identified multiple common single nucleotide polymorphisms (SNPs) associated with prostate cancer risk –. However, the function of these SNPs (or the causal variants these SNPs serve as proxies for) remains largely unknown and data describing their correlation with clinical factors or their interplay with other genetic and non-genetic factors are sparse, mainly due to the large sample sizes needed for sufficient statistical power.
To this end, we selected 39 SNPs from regions identified in previous GWAS and genotyped them in 10,501 prostate cancer cases and 10,831 controls within the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We tested each SNP for association with two strongly predictive clinical factors: Gleason grade and tumor stage. We investigated interactions between SNPs and known and potential environmental risk factors. Finally, we performed exploratory analysis to identify possible pair-wise SNP-SNP interactions.
Association between SNPs and prostate cancer risk
Subject characteristics are displayed in Table 1. All 39 SNPs were significantly associated with prostate cancer risk (Table 2) and directions of associations were consistent with previous findings –, –, , , . Although risk estimates varied somewhat between different cohorts (Table S1 and Figure S1), we observed overall no strong statistical evidence for heterogeneity (P>0.01). Risk effects per allele ranged from 1.06 (rs2928679) to 1.44 (rs16901979). Carriers of two copies of the rare ‘A’ allele of rs16901979 had a 3-fold increased risk to develop prostate cancer in this population. The allele frequency of rs16901979 varies widely across ethnicities (Hapmap population frequencies: 0.03 in CEU, 0.26 in CHB and 0.58 in YRI), and thus might explain a part of the differences seen in prostate cancer incidence across populations. Based on p-value, we observed the strongest association for rs4430796 located in HNF1B/TCF2 (OR: 0.80 (95% CI: 0.77–0.83), P = 2.09•10−28) and the weakest association for rs4961199 near CPNE3 (OR: 1.07 (95% CI: 1.02–1.14), P = 0.012). In addition, rs266849 near KLK3 was only weakly associated with prostate cancer risk (OR: 0.93 (95% CI: 0.89–0.98), P = 0.009). rs266849 was initially identified in a GWAS using controls selected for low prostate-specific antigen (PSA) levels (<0.5 ng/ml)  and it has been suggested that rs266849 is a marker for circulating PSA levels rather than for prostate cancer risk , .
Table 1. Characteristics of the study populations.doi:10.1371/journal.pone.0017142.t001
Table 2. Associations between selected SNPs and prostate cancer risk.doi:10.1371/journal.pone.0017142.t002
The primary analysis in most GWAS assumes an additive increase in risk for each risk allele carried. rs4961199 (P = 0.02) was the only SNP showing nominally significant evidence of departure from additivity in our data. This was not unexpected since rs4961199 was initially identified using a recessive inheritance model .
Replication in non-CGEMS cohorts
Eleven of the 39 SNPs included in this paper were identified through the Cancer Genetic Markers of Susceptibility (CGEMS) project (http://cgems.cancer.gov/) which has partial overlap with BPC3. Eight of these eleven SNPs were identified through CGEMS stage 2 including ATBC, CPS-II, HPFS and PLCO. We attempted to replicate these SNPs in the other cohorts (EPIC, MCCS, MEC and PHS) which collectively comprise 4,661 prostate cancer cases and 5,288 controls. Out of these eight SNPs, six replicated (Table S2) and two did not (rs4961199 near CPNE3 (OR: 0.99 (95% CI: 0.91–1.08), P = 0.82) and rs4962416 in CTBP2 (OR: 1.01 (95% CI: 0.95–1.08), P = 0.73)). Three SNPs (rs4857841 (EFFSEC), rs7841060 (8q24) and rs620861 (8q24)) were identified in CGEMS stage 3 (that in addition to ATBC, CPS-II, HPFS and PLCO included EPIC and MEC). Therefore, we tested these three SNPs in MCCS and PHS only (2,700 cases and 2,412 controls). Both rs7841060 (OR: 1.25 (95% CI: 1.12–1.40), P = 8.8•10−5) and rs620861 (OR: 0.89 (95% CI: 0.80–0.98), P = 0.01) were associated with prostate cancer risk whereas rs4857841 was not (OR: 1.03 (95% CI: 0.93–1.15), P = 0.52). Since we did not replicate rs4857841, rs4961199 and rs4962416 in the non-CGEMS studies, we did not pursue these SNPs in further analysis
SNP associations by tumor stage and grade
We next examined whether any SNP was differentially associated with tumor grade or stage at diagnosis (Table 3). A total of 1,717 cases were classified as high-stage (stage C or D at diagnosis) and 1,388 were classified as high-grade (Gleason grade 8–10 or equivalent, i.e. coded as poorly differentiated or undifferentiated). For 15% of the cases, we did not have information about tumor stage or Gleason grade. The minor alleles of two SNPs in the KLK3 gene (rs266849 and rs2735839), which have been previously associated with decreasing PSA levels , , were inversely associated with low-grade disease (Gleason <8) (OR: 0.91 (0.86–0.96) for rs266489, OR: 0.84 (0.79–0.89) for rs2735839) but associated with increased risk for high-grade disease (OR: 1.10 (1.00–1.22) for rs266489, OR: 1.07 (0.95–1.19) for rs2735839). The differences in SNP associations between low-grade and high-grade disease were statistically significant in case-only analysis (P = 0.0004 for rs266849 and P = 0.0001 for rs2735839). These results remained significant after Bonferroni correction (P = 0.014 for rs266849 and P = 0.0036 for rs2735839). No other SNP was differentially associated with tumor grade or stage after adjusting for multiple testing.
Table 3. Associations between selected SNPs and Gleason grade and Stage.doi:10.1371/journal.pone.0017142.t003
Association between non-genetic factors and prostate cancer risk
We tested for association between prostate cancer risk and potential non-genetic risk factors including family history of prostate cancer, diabetes, BMI, height, smoking and alcohol consumption. As expected, we observed a strong association between family history of prostate cancer and prostate cancer risk (OR: 1.77, 95% CI: 1.59–1.96, P = 1.88•10−27) as well as between diabetes and prostate cancer risk (OR: 0.73, 95% CI: 0.64–0.83, P = 1.61•10−6). Adjusting for BMI did not alter the association between diabetes and prostate cancer (data not shown). BMI was inversely associated with prostate cancer risk (OR: 0.996 (95% CI: 0.994–0.998) per BMI unit increase, P = 0.0004). This association was limited to obese men (BMI >30) compared to normal weight men (BMI<25) (OR: 0.86, 95% CI: 0.79–0.94, P = 0.0009), and we observed no association for being overweight (OR: 0.99, 95% CI: 0.93–1.05, P = 0.64). Adjusting for diabetes and smoking attenuated the association between obesity and prostate cancer risk (OR: 0.89, 95% CI: 0.82–0.98, P = 0.02). The inverse association between BMI and prostate cancer risk was restricted to non-aggressive cases as defined by Gleason grade <8 and tumor stages A and B (data not shown). Height was not associated with prostate cancer risk, when analyzed as a continuous variable (OR: 1.001, 95% CI: 1.000–1.002 per cm increase, P = 0.12) or in tertiles (OR: 1.02, 95% CI: 0.99–1.06, P = 0.24). We observed a non-significant reduced prostate cancer risk among both former smokers (OR: 0.95, 95% CI: 0.89–1.01, P = 0.08) and current smokers (OR: 0.91, 95% CI: 0.82–1.00, P = 0.06) compared to never smokers. Adjusting for alcohol consumption or BMI did not change the results (data not shown). Finally, consuming more than 30 g alcohol per day (corresponding to two drinks) was associated with an increased prostate cancer risk (OR: 1.09, 95% CI: 1.01–1.18, P = 0.03). Adjusting for smoking did not alter this association (data not shown).
SNP-environment and SNP-SNP interactions
To investigate if the associations with family history of prostate cancer, diabetes and BMI were stronger in specific genetic strata, we tested for effect modification by including a SNPxE interaction term in the model. We also tested for SNP effect modification of age at diagnosis (studying the main effect of age is not appropriate since our population comprises of a series of nested case-control studies matched on age). After adjusting for multiple testing, no SNP showed significant statistical interaction with any of the non-genetic factors tested (Table S3 and Table S4). Of note, two SNPs in the 8q24 region (rs620861, P = 0.05 and rs6983267, P = 0.004) showed nominally significant interactions with age at diagnosis, with the association being stronger in younger men. These results are in line with previous reports of stronger associations with earlier onset of disease for SNPs in the 8q24 region , , . We observed marginally significant interactions between diabetes and rs10486567 in JAZF1 (P = 0.04) and between BMI and rs10486567 (P = 0.03). This is of particular interest since genetic variation in JAFF1 has been associated with diabetes, albeit not the same genetic variants. In this study, obesity was associated with a reduced risk for prostate cancer. It has been shown that BMI is inversely associated with PSA levels  and thus, obese men are less likely to get diagnosed through PSA screening. Because BMI was associated with non-aggressive disease, we also looked at possible SNP-BMI interactions stratified by disease aggressiveness but observed no significant interactions (data not shown).
To assess if the ambiguous associations between prostate cancer risk and height, smoking and alcohol consumption are due to hidden SNP-environment interactions, we conducted a joint test of the environmental main effect and the SNP-environment interaction effect. This test has proven powerful when the non-genetic effect is limited to a specific genetic stratum . Across SNPs, the joint test was not significant for either alcohol or smoking after adjustment for multiple testing (Table S5, Table S6 and Table S7). Similarly, standard interaction tests between SNPs and height, SNPs and smoking and SNPs and alcohol consumption were not significant. Exploratory analyses of all possible pair-wise SNP-SNP interactions revealed no excess in significant interactions than expected by chance (50 out of 630 tests, Table S8). Furthermore, no SNP-SNP interaction was significant after correcting for multiple testing using a Bonferroni correction (lowest nominal P-value was 0.0005). Yeager and colleagues identified a SNP-SNP interaction between rs4242382 and rs620861 (P = 0.002) . We also observe this interaction (P = 0.02), but not when the analysis was restricted to only MCCS and PHS (P = 0.75).
In this study, we set out to examine whether SNPs identified in GWAS to be associated with prostate cancer show variation in risk by disease aggressiveness (tumor stage and grade) and/or interact with non-genetic and genetic factors. All 39 SNPs tested were significantly associated with prostate cancer in the overall analysis. However, the CGEMS project, which included four and six BPC3 studies in its second and third stage stages, respectively, contributed to identification of eleven SNPs investigated in the present study. We tested whether associations for these eleven SNPs could be confirmed in the remaining studies and with the exception of three SNPs, the findings were replicated with risk magnitudes similar to those in the CGEMS analysis. We could not replicate rs4961149 using data from three of the non-CGEMS cohorts. Since rs4961199 was included in CGEMS stage 2 based on its recessive association, we also tested the recessive model in the non-CGEMS studies and observed a non-significant association similar but weaker as compared with CGEMS (OR: 1.10, 95% CI: 0.82–1.47, P = 0.54).
Few of the observed associations differed by disease stage, tumor grade or environmental exposures. The most noteworthy finding was the qualitatively altered association according to Gleason grade for two SNPs near KLK3 (rs266849 and rs2735839), where the minor alleles were associated with lower risk of low-grade disease but higher risk of Gleason 8–10 tumors. This was previously observed by Kader and colleagues  who studied 5,000 patients and found a strong association between Gleason grade and rs2735839 (P = 3.7•10−7). The minor alleles of these SNPs have been associated with lower PSA levels indicating that carriers are less likely to be diagnosed at an early stage through PSA screening , . However, we did not observe any difference in the association of these two SNPs by disease stage, suggesting that delayed diagnosis might not fully explain these associations. Interestingly, the significant positive association of these two SNPs with Gleason 8–10 tumors support the clinical observations that PSA expression is lower in malignant than in normal prostatic epithelium and is further reduced in poorly differentiated tumors , . Together, these results suggest that KLK variation might influence high-grade prostate cancer risk through a yet unidentified pathway or simply as a genetic marker of the probability of a diagnosis of high versus low-grade prostate cancer diagnosis through its influence on PSA levels. To test this hypothesis, we performed case-only analysis based on year of diagnosis to reflect the introduction of wide-spread PSA screening (up to 1992 (670 men) vs. after 1992 (9831 men)). If the association between Gleason grade and KLK3 variation is due to altered PSA levels, we would expect to see differential associations according to year of diagnosis. We did not observe such differences, however, suggesting that the KLK3-prostate cancer association is not mediated by altered PSA levels. A recent Icelandic study conducted stratified analysis based on year of diagnosis and noticed that the association with prostate cancer was confined to the group of cases diagnosed in 1992 or later. These results suggest that the association between the KLK3 locus and prostate cancer is driven by the increasing frequency of PSA testing .
After adjusting for multiple testing, no other SNP was associated with clinical sub-types. Earlier studies had failed to link these SNPs to clinical characteristics , , suggesting that these SNPs affect prostate cancer risk overall and not solely for more (or less) aggressive or advanced cancer.
We found overall no evidence that these SNPs interact with known or proposed risk factors for prostate cancer including family history of prostate cancer risk, age of onset, diabetes, BMI, height, smoking or alcohol consumption. Studying the interactions between SNPs and diabetes was of particular interest since genetic variation in JAZF1 and TCF2 has been associated with both prostate cancer and diabetes , , , , . We did see a borderline statistically significant interaction between rs10486567 in JAZF1 and diabetes, but this particular SNP has not been associated with diabetes risk. A previous study conducted in CPS-II and PLCO found that diabetes did not mediate the association between JAZF1 and HNF1B/TCF2 SNPs and prostate cancer risk , and we observed no statistical interaction between diabetes and three SNPs in HNFIB/TCF2.
We observed no significant associations between prostate cancer risk and smoking or height and only a weak association between prostate cancer and alcohol consumption, even after accounting for the possibility of differences in the effects of these exposures by genotype. A meta-analysis of 39 studies observed that height was positively associated with risk (RR 1.05 per 10 cm increment, 95% CI 1.02–1.09) but the association was only seen in cohort studies . A recent large meta-analysis of smoking and prostate cancer incidence found overall no evidence of an association but reported an increased risk when considering number of cigarettes smoked. Moreover, they observed a 9% risk increase for former smokers . We did observe a marginal association between alcohol intake and prostate cancer risk. This is in line with earlier results indicating a weak risk increase for men consuming at least 25 grams alcohol per day (OR: 1.05 (95% CI: 1.00–1.08)) and for men consuming at least 50 grams per day (OR: 1.09 (95% CI: 1.02–1.17)) .
Overall, these results imply that the lack of robust associations between these environmental factors and prostate cancer risk is not due to interactions between these exposures and variation in any of the 36 SNPs assessed in this study. However, the lack of significant interactions does not rule out that gene-environment interactions exist in prostate cancer. All SNPs under study have been linked to prostate cancer through their main effects. Agnostic approaches such as incorporating gene-environment interactions in a genome-wide association study setting might identify genetic variants that only affect risk when acting with other factors. The lack of significant interactions can also reflect the low power to detect only modest interaction effects despite our sample size of 10,000 cases and 10,000 controls. It is important to note that our results do not rule out small departures from a multiplicative odds model for the joint effect of pairs of individual markers and risk factors, nor does absence of departure from a multiplicative odds model necessarily imply that these genetic loci and risk factors do not interact in some causal manner. Moreover, absence of interaction as defined here does not imply absence of a “public health interaction”, where the benefit from reducing a risk factor in terms of absolute risk reduction differs across genotypes .
This is, to our knowledge, the first large-scale study to explore possible interactions between confirmed prostate cancer susceptibility markers and a broad spectrum of known and possible environmental factors. The SNPs considered in this study show marginal per-allele odds ratios ranging between 1.07 and 1.44. It is possible that these odds ratios might be larger in strata defined by other prostate cancer risk factors, not evaluated in this study. It is well recognized that exploring such interactions requires large study populations with well-defined exposure data. With 10,501 prostate cancer cases, 10,831 controls and prospectively collected data within established cohorts, BPC3 is in a unique position to explore both gene-gene and gene-environment interactions as demonstrated here. For example, in the absence of main effects (which is not the same as assuming no marginal effect and plausibly consistent with modest marginal genetic or environmental effects), the BPC3 has 89% power to detect an interaction effect of 1.2 assuming an allele frequency of 20% and an environmental exposure with a prevalence of 20%.
As with all studies utilizing environmental exposure data, the present investigation would be expected to have some degree of misclassification in the measurement of those factors. It is possible that alternative modeling of the environmental risk factors or more precise exposure quantification would increase statistical power (e.g. analyzing intensity, duration or pack-years of smoking rather than as never/former/current). However, a critical issue in conducting pooled analysis across studies is to harmonize data. As exposure data gets more refined, there is an increasing risk of discrepancies between cohorts which increases the risk of “misclassification”. Since our study cohorts (MEC exempted) included predominantly men of European ancestry, we were limited in our ability to study other ethnicities.
Genome-wide association studies have been particularly successful for prostate cancer. Recently published secondary analysis of GWAS has now added ~10 additional prostate cancer SNPs to those presented here , , . At time of this study, we did not have genotype data for these SNPs in BPC3 and it remains to be seen if they are differentially associated with clinical subtypes or if they interact with non-genetic factors.
In summary, we independently replicated the association between prostate cancer risk and 36 SNPs identified in multi-stage genome-wide association studies of prostate cancer. Except for SNPs in KLK3 that were differentially associated with Gleason grade, we did not detect any differentiation in SNP associations according to Gleason grade or stage at diagnosis, two clinical factors strongly predictive of disease outcome. Moreover, we found no strong evidence that these SNPs interact with age, family history, diabetes, BMI, height, smoking or alcohol consumption.
Materials and Methods
The BPC3 has been described in detail elsewhere . In brief, the consortium combines resources from seven well-established cohort studies with blood samples collected as follows: the Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study in 1992-1993 , American Cancer Society Cancer Prevention Study II (CPS-II) in 1998 , the European Prospective Investigation into Cancer and Nutrition Cohort (EPIC – comprised of cohorts from Denmark, Great Britain, Germany, Greece, Italy, the Netherlands, Spain, and Sweden) in 1993 , the Health Professionals Follow-up Study (HPFS) in 1993 , the Multi-Ethnic Cohort (MEC) in 1995 , the Physicians' Health Study (PHS) in 1982 , and the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial in 1993–2001 . In addition, the Melbourne Collaborative Cohort Study (MCCS) established in 1990–1994  recently joined the consortium. Together, these eight cohorts collectively include over 265,000 men who provided a blood sample.
Prostate cancer cases were identified through population-based cancer registries or self-reports confirmed by medical records, including pathology reports. Except for the MCCS study, the BPC3 consists of a series of matched nested case-control studies within each cohort; controls were matched to cases on a number of potential confounding factors, such as age, ethnicity, and region of recruitment, depending on the cohort. MCCS used a case-cohort design, with a randomly sampled sub-cohort serving as controls. Written informed consent was obtained from all subjects and each study was approved by the Institutional Review Boards at their respective institutions. The IRBs for each study were as follows: US National Cancer Institute and National Institute for Health and Welfare (Helsinki, Finland) (ATBC), The Emory University School of Medicine Institutional Review Board (CPS-II), Ethikkommission - Medizinische Fakultät Heidelberg and Imperial College Research Ethics Committee (EPIC), The Institutional Review Board of Harvard School of Public Health (HPFS), The Cancer Council Victoria Human Research Ethics Committee (MCCS), The Institutional Review Board at the University of Southern California and the Institutional Review Board at the University of Hawaii (MEC), The Human Subjects Committee at Brigham and Women's Hospital (PHS) and NCI Special Studies Institutional Review Board (PLCO).
The current study was restricted to individuals who self-reported as being Caucasian. We had genotype data for a total of 10,501 prostate cancer cases and 10,831 controls. Data on disease stage and grade at time of diagnosis were collected from each cohort, wherever possible. A total of 1,717 cases were classified as high-stage (stage C or D at diagnosis) and 1,388 were classified as high-grade (Gleason grade >7 or equivalent, i.e. coded as poorly differentiated or undifferentiated). For 15% of the cases, we did not have information about tumor stage or Gleason grade.
Baseline information of height and body weight, family history of prostate cancer, cigarette smoking status (never, past, and current), alcohol intake (g/day) and information about a pre-existing diabetes diagnosis were collected by self-report. Family history, which was defined as having at least one first-degree family member diagnosed with prostate cancer, was available for all but two cohort studies (PHS and EPIC). For some countries in EPIC, weight and height was measured.
Collection and harmonization of non-genetic data
We collected data on family history, diabetes at baseline, smoking, alcohol consumption, height and BMI for each study. Family history of prostate cancer was dichotomized into “yes” (1,780 subjects) or “no” (12,382 subjects). Age was calculated at age of diagnosis/selection as control except for MCCS (at baseline for controls) and MEC (at blood draw for controls) and further dichotomized into younger or equal to 65 years old or older than 65 years. BMI was calculated based on baseline weight (kg) and height (m) categorized into 3 categories: normal weight (BMI<25 kg/m2, 7,947 subjects), overweight (BMI 25–30 kg/m2, 10,206 subjects) and obese (BMI>30 kg/m2, 2,771 subjects). Height was analyzed both as a continuous variable and in tertiles (<173 cm (7,221 subjects), 173–180 cm (7,324 subjects) and >180 cm (6,548 subjects).
Smoking was categorized into 3 categories: never (7,725 subjects), former (9,457 subjects) and current (3,989 subjects). Alcohol was dichotomized into never and moderate drinkers (<30 g/day or two drinks per day; 17,398 subjects) or heavy drinkers (≥30 g/day or 2 drinks per day; 3,257 subjects). Pre-existing diabetes was dichotomized into “yes” (982 subjects) or “no” (19,643 subjects).
We agreed on a common protocol prior to data collection based on data availability in the studies. Each study was responsible for sending the data in a format as described in the protocol to facilitate data harmonization. We agreed on collecting as detailed information as possible without having to exclude any study due to lack of covariate information (that is, we aimed for the least common denominator for the variables of interest). Inconsistencies or clarifications were handled by a dialogue between the data coordinating center and the individual studies. All studies have published analysis on these variables earlier and details on quality checks can be found in study-specific publications. All statistical analyses were conducted centrally.
SNP selection and genotyping
We selected 39 SNPs based on the literature for prostate cancer GWAS (Table 2). These include (genomic location in parenthesis): rs721048 (2p15), rs1465618 (2p21), rs12621278 (2q31.3), rs2660753 (3q12.1), rs4857841 (3q21.3), rs17021918 (4q22.3), rs12500426 (4q22.3), rs7679673 (4q24), rs9364554 (6q25.3), rs10486567 (7p15.2), rs6465657 (7p21.3), rs1512268 (8p21.2), rs2928679 (8p21.2), rs4961199 (8q21.3), rs1016343 (8q24.21), rs7841060 (8q24.21), rs16901979 (8q24.21), rs620861 (8q24.21), rs6983267 (8q24.21), rs1447295 (8q24.21), rs4242382 (8q24.21), rs7837688 (8q24.21), rs16902094 (8q24.21), rs1571801 (9p33.2), rs10993994 (10q11.23), rs4962416 (10q26.13), rs7127900 (11p15.5), rs12418451 (11q13.2), rs7931342 (11q13.2), rs10896449 (11q13.2), rs11649743 (17q12), rs4430796 (17q12), rs7501939 (17q12), rs1859962 (17q24.3), rs266849 (19p13.33), rs2735839 (19p13.33), rs5759167 (22q13.2), rs5945572 (Xp11.22) and rs5945619 (Xp11.22). For rs12418451, we used genotypes from either rs12418451 or rs10896438 (r2 = 0.964 in HapMap CEU population) and for and rs2928679 we used genotypes from either and rs2928679 or rs13264338 (r2 = 0.966 in HapMap CEU population). We did not have genotype data on rs4961199, rs16901979 and rs16902094 for MCCS.
Genotyping was performed using the TaqMan assay (Applied Biosystems, Foster City, CA) in five different genotyping laboratories: Core Genotyping Facility at National Cancer Institute, Harvard School of Public Health, University of South California, DKFZ and UK Cancer Research. Blinded duplicated samples indicated no genotyping error. For each autosomal SNP, we tested HWE in the controls in each study separately. All autosomal SNPs were in HWE (P>0.01).
We tested the association between prostate cancer risk and each SNP with a likelihood ratio test based on unconditional logistic regression. We adjusted all analyses for study and age at diagnosis or selection as a control in five year intervals using indicator variables. All odds ratios are calculated per copy of the minor alleles (0,1,2) carried. For each SNP, we used Cochran's Q statistic to test for heterogeneity between studies.
To estimate odds ratios for high-grade or low-grade disease, we performed multinomial regression with an outcome variable coded as 0 (control), 1 (low-grade) or 2 (high-grade). To test for differential SNP associations between low-grade and high-grade disease, we used a likelihood ratio test based on case-only analysis. We repeated this analysis for high-stage/low-stage disease.
We tested for interaction between SNPs and non-genetic factors by conducting a one degree-of-freedom likelihood ratio test of a single interaction term (SNPxE) as implemented in an unconditional logistic regression. When an environmental factor had more than two categories (as is the case for smoking, BMI and height), we used ordinal coding for the interaction term. To explore whether associations with proposed environmental risk factors may have been masked by effect heterogeneity, we performed a joint (2 d.f.) test of the environmental main effect and the interaction effect. This test has been shown to outperform the standard marginal test when the environmental effect is restricted to a genetic stratum . Cohorts with no variability in exposure (such as ATBC and smoking) were excluded from gene-environment interaction analyses. We tested for pair-wise SNP-SNP interactions using a one degree-of-freedom likelihood ratio test of a single interaction term as described for the SNP-environment interaction tests.
We tested for dominance deviation from an additive model by including an additional SNP covariate coded as (0,1,0) for (homozygote common allele, heterozygote, homozygote rare allele) respectively. Based on unconditional regression, we performed a one degree-of-freedom likelihood ratio test where the full model was tested against a model only including the SNP covariate with additive coding (0,1,2) as described above. All reported P values are two-sided and uncorrected for multiple hypothesis testing. Analyses were conducted in R  and SAS version 9.1.
Study-specific SNP associations with prostate cancer risk. For rs4961199, rs16901979 and rs16902094 we did not have genotype data from MCCS.
Heterogeneity in main effects between studies for selected SNPs.
Associations between SNPs identified through CGEMS and prostate cancer risk stratified on CGEMS membership.
SNP-Environment interactions for family history of prostate cancer and age at diagnosis.
SNP-Environment interactions with diabetes and BMI.
Association between height and prostate cancer risk stratified by SNP genotypes.
Association between smoking and prostate cancer risk stratified by SNP genotypes.
Association between alcohol consumption and prostate cancer risk stratified by SNP genotypes.
Pair-wise SNP-SNP interactions that reached a p-value < 0.05.
Conceived and designed the experiments: SL FS AS RCT DC BH JM LLM DA PK. Performed the experiments: AS DC GS SJC SW MY CAH DH. Analyzed the data: SL FS AS RCT SIB WRD GS CG JM DA PK. Contributed reagents/materials/analysis tools: NA GA BB-d-M DC JMG GGG EG RBH JH MJ RK LNK CN ER CS MS MJT DT JV BH JM LLM DA. Wrote the manuscript: SL FS AS RCT DC JM DA PK. Critical review of manuscript: SIB WRD GS NA GA BB-d-M SC DC MG GGG EG CG CAH RH JH DJH MJ RK LNK CN ER CS MS DOS MJT DT JV SJW MY BH LLM.
- 1. Gronberg H (2003) Prostate cancer epidemiology. Lancet 361: 859–864.
- 2. Duggan D, Zheng SL, Knowlton M, Benitez D, Dimitrov L, et al. (2007) Two genome-wide association studies of aggressive prostate cancer implicate putative prostate tumor suppressor gene DAB2IP. J Natl Cancer Inst 99: 1836–1844.
- 3. Eeles RA, Kote-Jarai Z, Al Olama AA, Giles GG, Guy M, et al. (2009) Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat Genet 41: 1116–1121.
- 4. Eeles RA, Kote-Jarai Z, Giles GG, Olama AA, Guy M, et al. (2008) Multiple newly identified loci associated with prostate cancer susceptibility. Nat Genet 40: 316–321.
- 5. Gudmundsson J, Sulem P, Gudbjartsson DF, Blondal T, Gylfason A, et al. (2009) Genome-wide association and replication studies identify four variants associated with prostate cancer susceptibility. Nat Genet 41: 1122–1126.
- 6. Gudmundsson J, Sulem P, Manolescu A, Amundadottir LT, Gudbjartsson D, et al. (2007) Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet 39: 631–637.
- 7. Gudmundsson J, Sulem P, Rafnar T, Bergthorsson JT, Manolescu A, et al. (2008) Common sequence variants on 2p15 and Xp11.22 confer susceptibility to prostate cancer. Nat Genet 40: 281–283.
- 8. Gudmundsson J, Sulem P, Steinthorsdottir V, Bergthorsson JT, Thorleifsson G, et al. (2007) Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet 39: 977–983.
- 9. Hsu FC, Sun J, Wiklund F, Isaacs SD, Wiley KE, et al. (2009) A novel prostate cancer susceptibility locus at 19q13. Cancer Res 69: 2720–2723.
- 10. Kote-Jarai Z, Easton DF, Stanford JL, Ostrander EA, Schleutker J, et al. (2008) Multiple novel prostate cancer predisposition loci confirmed by an international study: the PRACTICAL Consortium. Cancer Epidemiol Biomarkers Prev 17: 2052–2061.
- 11. Sun J, Zheng SL, Wiklund F, Isaacs SD, Li G, et al. (2009) Sequence variants at 22q13 are associated with prostate cancer risk. Cancer Res 69: 10–15.
- 12. Thomas G, Jacobs KB, Yeager M, Kraft P, Wacholder S, et al. (2008) Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet 40: 310–315.
- 13. Yeager M, Chatterjee N, Ciampa J, Jacobs KB, Gonzalez-Bosquet J, et al. (2009) Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet 41: 1055–1057.
- 14. Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, et al. (2007) Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39: 645–649.
- 15. Zheng SL, Stevens VL, Wiklund F, Isaacs SD, Sun J, et al. (2009) Two independent prostate cancer risk-associated Loci at 11q13. Cancer Epidemiol Biomarkers Prev 18: 1815–1820.
- 16. Ahn J, Berndt SI, Wacholder S, Kraft P, Kibel AS, et al. (2008) Variation in KLK genes, prostate-specific antigen and risk of prostate cancer. Nat Genet 40: 1032–1034; author reply 1035-1036.
- 17. Wiklund F, Zheng SL, Sun J, Adami HO, Lilja H, et al. (2009) Association of reported prostate cancer risk alleles with PSA levels among men without a diagnosis of prostate cancer. Prostate 69: 419–427.
- 18. Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, et al. (2006) Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci U S A 103: 14068–14073.
- 19. Zheng SL, Sun J, Cheng Y, Li G, Hsu FC, et al. (2007) Association between two unlinked loci at 8q24 and prostate cancer risk among European Americans. J Natl Cancer Inst 99: 1525–1533.
- 20. Skolarus TA, Wolin KY, Grubb RL 3rd (2007) The effect of body mass index on PSA levels and the development, screening and treatment of prostate cancer. Nat Clin Pract Urol 4: 605–614.
- 21. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ (2007) Exploiting gene-environment interaction to detect genetic associations. Hum Hered 63: 111–119.
- 22. Kader AK, Sun J, Isaacs SD, Wiley KE, Yan G, et al. (2009) Individual and cumulative effect of prostate cancer risk-associated variants on clinicopathologic variables in 5,895 prostate cancer patients. Prostate 69: 1195–1205.
- 23. Abrahamsson PA, Lilja H, Falkmer S, Wadstrom LB (1988) Immunohistochemical distribution of the three predominant secretory proteins in the parenchyma of hyperplastic and neoplastic prostate glands. Prostate 12: 39–46.
- 24. Lilja H, Ulmert D, Vickers AJ (2008) Prostate-specific antigen and prostate cancer: prediction, detection and monitoring. Nat Rev Cancer 8: 268–278.
- 25. Gudmundsson J, Besenbacher S, Sulem P, Gudbjartsson DF, Olafsson I, et al. (2010) Genetic Correction of PSA Values Using Sequence Variants Associated with PSA Levels. Sci Transl Med Dec 15 2(62): 62ra92.
- 26. Fitzgerald LM, Kwon EM, Koopmeiners JS, Salinas CA, Stanford JL, et al. (2009) Analysis of recently identified prostate cancer susceptibility loci in a population-based study: associations with family history and clinical features. Clin Cancer Res 15: 3231–3237.
- 27. Winckler W, Weedon MN, Graham RR, McCarroll SA, Purcell S, et al. (2007) Evaluation of common variants in the six known maturity-onset diabetes of the young (MODY) genes for association with type 2 diabetes. Diabetes 56: 685–693.
- 28. Zeggini E, Scott LJ, Saxena R, Voight BF, Marchini JL, et al. (2008) Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 40: 638–645.
- 29. Stevens VL, Ahn J, Sun J, Jacobs EJ, Moore SC, et al. (2010) HNF1B and JAZF1 genes, diabetes, and prostate cancer risk. Prostate 70: 601–607.
- 30. Macinnis RJ, English DR (2006) Body size and composition and prostate cancer risk: systematic review and meta-regression analysis. Cancer Causes Control 17: 989–1003.
- 31. Huncharek M, Haddock S, Reid R, Kupelnick B (2010) Smoking as a Risk Factor for Prostate Cancer: A Meta-Analysis of 24 Prospective Cohort Studies. Am J Public Health, Apr 100(4): 693–701.
- 32. Bagnardi V, Blangiardo M, La Vecchia C, Corrao G (2001) A meta-analysis of alcohol drinking and cancer risk. Br J Cancer 85: 1700–1705.
- 33. Siemiatycki J, Thomas DC (1981) Biological models and statistical interactions: an example from multistage carcinogenesis. Int J Epidemiol 10: 383–387.
- 34. Hunter DJ, Riboli E, Haiman CA, Albanes D, Altshuler D, et al. (2005) A candidate gene approach to searching for low-penetrance breast and prostate cancer genes. Nat Rev Cancer 5: 977–985.
- 35. The Alpha-Tocopherol, Beta Carotene Cancer Prevention Study Group (1994) The effect of vitamin E and beta carotene on the incidence of lung cancer and other cancers in male smokers. N Engl J Med 330: 1029–1035.
- 36. Calle EE, Rodriguez C, Jacobs EJ, Almon ML, Chao A, et al. (2002) The American Cancer Society Cancer Prevention Study II Nutrition Cohort: rationale, study design, and baseline characteristics. Cancer 94: 2490–2501.
- 37. Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, et al. (2002) European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr 5: 1113–1124.
- 38. Giovannucci E, Pollak M, Liu Y, Platz EA, Majeed N, et al. (2003) Nutritional predictors of insulin-like growth factor I and their relationships to cancer in men. Cancer Epidemiol Biomarkers Prev 12: 84–89.
- 39. Kolonel LN, Altshuler D, Henderson BE (2004) The multiethnic cohort study: exploring genes, lifestyle and cancer risk. Nat Rev Cancer 4: 519–527.
- 40. Chan JM, Stampfer MJ, Ma J, Gann P, Gaziano JM, et al. (2002) Insulin-like growth factor-I (IGF-I) and IGF binding protein-3 as predictors of advanced-stage prostate cancer. J Natl Cancer Inst 94: 1099–1106.
- 41. Hayes RB, Reding D, Kopp W, Subar AF, Bhat N, et al. (2000) Etiologic and early marker studies in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial. Control Clin Trials 21: 349S–355S.
- 42. Severi G, Morris HA, MacInnis RJ, English DR, Tilley WD, et al. (2006) Circulating insulin-like growth factor-I and binding protein-3 and risk of prostate cancer. Cancer Epidemiol Biomarkers Prev 15: 1137–1141.
- 43. R Development Core Team (2008) R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.