The authors have the following interests. Hans M. Albertsen, Rakesh Chettier, Pamela Farrington and Kenneth Ward are employed by Juneau Biosciences LLC, the funder of this study. All authors have direct financial interest in Juneau Biosciences. A US provisional patent application has been filed by Juneau Biosciences that include the results and inventions reported in the manuscript. Title: Genetic Markers Associated with Endometriosis and Use Thereof Serial No.: 61/707,730 Filing date: September 28, 2012. There are no further patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.
Conceived and designed the experiments: HA RC KW. Performed the experiments: HA RC. Analyzed the data: HA RC PF KW. Contributed reagents/materials/analysis tools: KW. Wrote the paper: HA RC KW.
Endometriosis is a common gynecological condition with complex etiology defined by the presence of endometrial glands and stroma outside the womb. Endometriosis is a common cause of both cyclic and chronic pelvic pain, reduced fertility, and reduced quality-of-life. Diagnosis and treatment of endometriosis is, on average, delayed by 7–10 years from the onset of symptoms. Absence of a timely and non-invasive diagnostic tool is presently the greatest barrier to the identification and treatment of endometriosis. Twin and family studies have documented an increased relative risk in families. To identify genetic factors that contribute to endometriosis we conducted a two-stage genome-wide association study (GWAS) of a European cohort including 2,019 surgically confirmed endometriosis cases and 14,471 controls. Three of the SNPs we identify associated at P<5×10−8 in our combined analysis belong to two loci: LINC00339-WNT4 on 1p36.12 (rs2235529; P = 8.65×10−9, OR = 1.29, CI = 1.18–1.40) and RND3-RBM43 on 2q23.3 (rs1519761; P = 4.70×10−8, OR = 1.20, Cl = 1.13–1.29, and rs6757804; P = 4.05×10−8, OR = 1.20, Cl = 1.13–1.29). Using an adjusted Bonferoni significance threshold of 4.51×10−7 we identify two additional loci in our meta-analysis that associate with endometriosis:, RNF144B-ID4 on 6p22.3 (rs6907340; P = 2.19×10−7, OR = 1.20, Cl = 1.12–1.28), and HNRNPA3P1-LOC100130539 on 10q11.21 (rs10508881; P = 4.08×10−7, OR = 1.19, Cl = 1.11–1.27). Consistent with previously suggested associations to WNT4 our study implicate a 150 kb region around WNT4 that also include LINC00339 and CDC42. A univariate analysis of documented infertility, age at menarche, and family history did not show allelic association with these SNP markers. Clinical data from patients in our study reveal an average delay in diagnosis of 8.4 years and confirm a strong correlation between endometriosis severity and infertility (n = 1182, P<0.001, OR = 2.18). This GWAS of endometriosis was conducted with high diagnostic certainty in cases, and with stringent handling of population substructure. Our findings broaden the understanding of the genetic factors that play a role in endometriosis.
Endometriosis affects 5–10% of women in their reproductive years with symptoms including pelvic pain, dyspareunia, dysmenorrhea and infertility
We conducted a discovery GWAS on surgically confirmed endometriosis patients and population controls using the Illumina OmniExpress BeadChip. SNPs were limited to the autosomes and SNPs with an Illumina Gentrain score ≥0.65. We further eliminated SNPs with callrate <0.98, Hardy-Weinberg Equilibrium (hwe) <0.001 and minor allele frequencies (MAF) <0.01. After filtering 580,699 SNPs remained. Next, samples with callrates <0.98 were eliminated. The remaining samples were tested for unknown familial relationships using genome-wide identity-by-state (IBS), and samples closer than 3rd-degree (π>0.2) were removed. We used ADMIXTURE (ver. 1.22)
The replication samples included 505 cases and 1811 controls selected for the same criteria as the discovery set. The λ-value for the replication cohort was determined to be 1.01 suggesting no measurable population stratification. After applying the same SNP filters as above we analyzed the top 100 SNPs from the discovery GWAS in the replication set. A significance threshold for the study, allowing for multiple correction, was chosen at 4.51×10−7 (0.05/108,699; 108,699 being the number of independent SNPs in the panel of 580,699 filtered SNPs with r2<0.20). A meta-analysis of the discovery and replication results was performed using Cochran-Mantel-Hanzel test and revealed 8 SNPs from 4 genomic regions that passed our genome-wide significance threshold including: LINC00339-WNT4 on 1p36.12 (rs2235529; Pmeta = 3.05×10−9, OR = 1.30); RND3-RBM43 on 2q23.3 (rs6757804; Pmeta = 6.45×10−8, OR = 1.20), RNF144B-ID4 on 6p22.3 (rs6907340; Pmeta = 2.19×10−7, OR = 1.20); and HNRNPA3P1-LOC100130539 on 10q11.21 (rs10508881; Pmeta = 4.08×10−7, OR = 1.19) and shown in detail in
SNP | gene | Chr | Pos | allele | stage | Case MAF | Control MAF | P |
OR |
95%CI | Phet |
rs4654783 | WNT4 | 1 | 22,439,520 | a/g | Discovery | 0.34 | 0.295 | 2.43E−07 | 1.23 | 1.14–1.34 | |
Replication | 0.323 | 0.298 | 1.31E−01 | 1.13 | 0.97–1.32 | ||||||
Meta |
1.40E−07 | 1.21 | 1.13–1.30 | 0.332 | |||||||
Combined Trend |
0.336 | 0.295 | 1.17E−07 | 1.21 | 1.13–1.29 | ||||||
rs2235529 | WNT4 | 1 | 22,450,487 | a/g | Discovery | 0.188 | 0.153 | 1.36E−07 | 1.28 | 1.16–1.41 | |
Replication | 0.182 | 0.142 | 1.38E−03 | 1.36 | 1.13–1.64 | ||||||
Meta |
3.05E−09 | 1.3 | 1.19–1.41 | 0.583 | |||||||
Combined Trend |
0.186 | 0.151 | 8.65E−09 | 1.29 | 1.18–1.40 | ||||||
rs1519754 | RND3 | RBM43 | 2 | 151,619,693 | c/a | Discovery | 0.446 | 0.403 | 5.67E−05 | 1.19 | 1.11–1.30 | |
Replication | 0.45 | 0.405 | 9.63E−03 | 1.2 | 1.04–1.38 | ||||||
Meta |
1.75E−07 | 1.2 | 1.12–1.28 | 0.964 | |||||||
Combined Trend |
0.447 | 0.403 | 1.15E−07 | 1.2 | 1.12–1.28 | ||||||
rs6734792 | RND3 | RBM43 | 2 | 151,624,882 | g/a | Discovery | 0.448 | 0.404 | 3.52E−05 | 1.2 | 1.11–1.29 | |
Replication | 0.453 | 0.406 | 7.48E−03 | 1.21 | 1.05–1.39 | ||||||
Meta |
8.18E−08 | 1.2 | 1.12–1.28 | 0.945 | |||||||
Combined Trend |
0.449 | 0.404 | 5.19E−08 | 1.2 | 1.12–1.28 | ||||||
rs1519761 | RND3 | RBM43 | 2 | 151,633,204 | g/a | Discovery | 0.445 | 0.401 | 3.54E−05 | 1.2 | 1.11–1.29 | |
Replication | 0.452 | 0.403 | 5.85E−03 | 1.21 | 1.05–1.40 | ||||||
Meta |
7.30E−08 | 1.2 | 1.12–1.29 | 0.886 | |||||||
Combined Trend |
0.447 | 0.401 | 4.70E−08 | 1.2 | 1.13–1.29 | ||||||
rs6757804 | RND3 | RBM43 | 2 | 151,635,832 | g/a | Discovery | 0.445 | 0.401 | 3.43E−05 | 1.2 | 1.11–1.29 | |
Replication | 0.452 | 0.403 | 5.44E−03 | 1.21 | 1.06–1.40 | ||||||
Meta |
6.45E−08 | 1.2 | 1.13–1.29 | 0.876 | |||||||
Combined Trend |
0.446 | 0.4011 | 4.05E−08 | 1.2 | 1.13–1.29 | ||||||
rs6907340 | RNF144B | ID4 | 6 | 19,803,768 | a/g | Discovery | 0.417 | 0.371 | 5.49E−06 | 1.21 | 1.12–1.31 | |
Replication | 0.412 | 0.378 | 4.54E−02 | 1.15 | 1.00–1.33 | ||||||
Meta |
2.19E−07 | 1.2 | 1.12–1.28 | 0.579 | |||||||
Combined Trend |
0.415 | 0.372 | 1.25E−07 | 1.2 | 1.12–1.28 | ||||||
rs10508881 | HNRNPA3P1 | LOC100130539 | 10 | 44,541,565 | a/g | Discovery | 0.45 | 0.405 | 3.18E−05 | 1.2 | 1.11–1.30 | |
Replication | 0.42 | 0.387 | 6.06E−02 | 1.15 | 1.00–1.32 | ||||||
Meta |
4.08E−07 | 1.19 | 1.11–1.27 | 0.589 | |||||||
Combined Trend |
0.442 | 0.403 | 1.57E−06 | 1.18 | 1.10–1.26 |
The discovery stage included 1,514 endometriosis cases and 12,660 population controls, and the replication stage included 505 cases and 1,811 controls.
The P-values were determined using the Cochrane-Armitage trend test. P-values for the Discovery set reflect PCA adjusted P trend values.
Odds-ratios (OR) and confidence intervals (CI) are calculated using the non-risk allele as the reference.
The Meta analysis was performed using Cochran-Mantel-Haenzel statistics.
P values of heterogeneties (Phet) across discovery and replication stages calculated using Breslow-Day Test.
Cochrane-Armitage trend test P-values based on the combined genotypes from the Discovery and Replication data.
Significance threshold is 4.59×10−7, and is determined by 0.05/108,699 where 108,699 is the number of independent SNPs in the panel with r2 less than 0.20.
To further characterize the signals from the four most strongly associated regions and the IL33 region, we performed imputation using 1000-Genome and dataset utilizing IMPUTE2 (ver. 2.2.2)
P-values of genotyped SNPs(•) and imputed SNPs (×) are plotted against their physical position on chromosome 1 as -log10(P-value) on the left (hg19/GRCh37). The plot identify a 150 kb LD-block (22.35 Mb-22.50 Mb) that show association with endometriosis and include WNT4, CDC42 and HSPC157 (gene symbol: LINC00339). Key SNPs are indicated in the Figure with their rsID. Two SNPs, rs16826658 (green arrow) and rs7521902 (green triangle), previously suggested to be associated with endometriosis (Uno et al. 2010; Painter et al. 2011), are located at the right-most boundary of the associated region. A third SNP, rs2473277, located at the left-most boundary of the LD-region was also tentatively associated by Uno et al. (2010). The genetic recombination rates estimated from 1000 Genome samples (EUR) are shown with a blue line according to the scale indicated to the right. The chromosomal position is indicated in Mb at the bottom of the figure.
WNT4 has previously been associated with endometriosis, first by Uno et al. (2010) who noted that rs16826658, approximately 16 kb upstream of WNT4, showed a possible association to endometriosis (P = 1.66×10−6, OR = 1.20) in a Japanese population, and again by Painter et al. (2011) who reported that rs7210902, located approximately 22 kb upstream of WNT4, also showed evidence for association in a European cohort (P = 9.0×10−5, OR = 1.16). Our imputation analysis replicate the association of rs7210902 with endometriosis (P = 6.4×10−5,OR = 1.17) and confirm the involvement of the WNT4-region in the pathogenesis of endometriosis. We only found weak evidence of association with rs16826658 (P = 0.05, OR = 1.07), because the minor allele is very common and because of the different ethnic backgrounds between the studies. To further evaluate the signals from the WNT4 region, we performed a haplotype analysis of three key SNPs from our study (rs10917151, rs4654783 and rs2235529) together with rs16826658, and rs7210902 using the imputed data from our population. The haplotype-results are summarized in
Uno et al.
Painter et al.
A recent meta-analysis published by Nyholt et al.
A recent candidate gene study investigating the
ENDO1 is a susceptibility locus on chromosome 10q26 (OMIM phenotype number 131200) identified by linkage analysis
Clinical features commonly used to characterize and stratify endometriosis include infertility, pelvic pain, severity, age-at-menarche and familiality. To determine the diagnostic delay in our patient-population we identified a group of women (n = 874) that reported both age at onset-of-symptoms (mean-age-onset = 19.04 years) and age at diagnosis (mean-age-diagnosis = 27.49), and observed an average diagnostic delay of 8.44 years, similar to previous studies. We then went on to examine if our samples showed any clinical correlations using logistic regression. The analysis revealed strong correlations between severity and infertility (P<0.001, OR = 2.19), and between severity and diagnostic delay (P<0.001, OR = 1.04) as shown in
Clinical Feature | Moderate or Severe endometriosis (n = 842) | Mild endometriosis (n = 1177) | Category | OR | Beta | SE | P |
Infertility (1182) | 525 | 657 | Yes or No | 2.19 | 0.78 | 0.12 | 8.52E-11 |
Family History (1881) | 790 | 1091 | Yes or No | 0.89 | -0.11 | 0.09 | 0.23 |
Age at Menarche (921) | 405 | 516 | < = 12 or >12 yrs | 1.20 | 0.18 | 0.13 | 0.18 |
Diagnostic Delay (874) | 383 | 491 | 0 to 35 | 1.04 | 0.04 | 0.01 | 2.18E-05 |
Clinical features were correlated to severity. Only patients that could be unambiguously categorized were included in the analysis with total counts provided in parenthesis next to the clinical feature. P-values (P) are calculated using Wald test. Beta is the regression coefficients and SE the standard error from logistic regression.
After removing markers with r2>0.8 among the top 5 associated regions (incl. the IL33 locus), we conducted multivariate logistic regression using the combined set of 2,019 cases and 14,471 controls. All of the 5 SNPs rs101917151, rs6757804, rs6907340, rs10975519 and rs10508881 analyzed remained significant with OR of 1.3, 1.2, 1.18, 1.17 and 1.17. Each marker appear to be an independent risk factors for endometriosis. Comparison of the OR between the discovery and replication datasets, shown in
A two-stage GWAS and a replication study involving 2,019 cases and 14,471 controls was performed which identified four novel loci strongly associated with endometriosis and confirmed the involvement of a region around WNT4 which previously have been suggested as being associated to endometriosis. Nine other regions identified in the study also hold promise as candidate loci for endometriosis. Utmost care was taken in the clinical classification of patients and only surgically-confirmed cases with >95% European ancestry were considered in this large GWAS of endometriosis. The study is well powered (>90%) to identify a marker at or above 10% minor allele frequency (MAF) with odds-ratio (OR) >1.20, but we estimate the top 5 loci only explain about 1.5% of the phenotypic variance of endometriosis. Since the few risk loci we detected all have odds ratios <1.30 it must be assumed that any new endometriosis loci that contribute to the “missing heritability” must be rare, recent, or show minimal effect. GWAS, by design, detects only very old founder effects. When a phenotype includes infertility, like endometriosis, a high mutation rate would be required to replenish the disease-causing alleles lost from the gene-pool due to infertility. One suitable avenue to investigate under that scenario is to use whole genome sequencing of high-risk families rather than SNP-based GWAS. Little is presently known about the pathophysiology of endometriosis, but we hope that a more detailed investigation of the loci presented in this paper will help elucidate the pathogenesis of endometriosis and clarify its genetic underpinnings.
All subjects and controls provided written informed consent in accordance with study protocols approved by Quorum Review IRB (Seattle, WA 98101).
Patients included in the present study were invited to participate via an outreach program at
The inclusion criteria in the endometriosis case population in the present study is surgically confirmed diagnosis of endometriosis with laparoscopy being the preferred method. Trained OB/GYN clinicians performed the medical record review and clinical assessment of each individual patient. Patients were considered to be affected if they had biopsy-proven lesions or if operative reports revealed unambiguous gross lesions. Patients were further categorized by severity, clinical history of pelvic pain, infertility, dyspareunia or dysmenorrhea and family history. Patients were grouped into one of three classes of severity: mild, moderate or severe, following the general guidelines set forth by ASRM
Saliva samples were collected using the Oragene 300 saliva collection kit (DNA Genotek; Ottawa, Ontario, Canada) and DNA was extracted using an automated extraction instrument, AutoPure LS (Qiagen; Valencia, CA), and manufacturer's reagents and protocols. DNA quality was evaluated by calculation absorbance ratio OD260/OD280, and DNA quantification was measured using PicoGreen® (Life Technologies; Grand Island, NY).
The discovery set of 1514 cases and 12660 controls and replication set of 505 cases and 1811 controls were genotyped using the Illumina Human OmniExpress Chip (Illumina; San Diego, CA) according to protocols provided by the manufacture.
A Taqman® 7900 instrument (Life Technologies; Grand Island, NY) and manufacturer’s protocols were used to genotype rs61764370. Genotypes were determined using Taqman genotyping software SDS (v2.3) and the genotype cluster was visually inspected. Genotyping QC for rs61764370 passed standard criteria of call rate >95% and no deviation from HWE (p<0.001) was observed.
Samples were excluded from the analysis if they missed any of the following quality thresholds:
Evidence of familial relationship closer that 3rd-degree (π>0.2) using genome-wide Identity-By-State (IBS) estimation implemented in PLINK
Samples with missing genotypes >0.02
Samples with non-European admixture >0.05 as determined by ADMIXTURE
SNPS were excluded from the analysis if they missed any of the following quality thresholds:
SNPs with Illumina GenTrain Score <0.65
SNPs from copy number variant regions or regions with adjacent SNPs
SNPs failing Hardy-Weinberg Equilibrium (HWE) P≤10−3
SNPs with minor allele frequency (MAF) ≤0.01 in the control population
SNP call rate ≤98%
ADMIXTURE (ver. 1.22) was used to estimate the individual ancestry proportion
PCA was applied to account for population stratification among the European subgroups. We selected the previously identified 33,067 SNPs to infer the axes of variation using EIGENSTRAT
Power calculations was performed using QUANTO (ver. 1.2), using a log-additive model. The analysis included 2019 cases and 14471 controls with the following assumptions: Type I error = 0.05, a minor allele frequency ≥0.10 and the odds-ratio ≥1.2.
After the quality of all data was confirmed for accuracy, genetic association was determined using the whole-genome association analysis toolset, PLINK (ver. 1.07)
Differences in allele frequencies between endometriosis patients and population controls were tested for each SNP by a 1-degree-of-freedom Cochran-Armitrage Trend test.
The allelic odds ratios were calculated with a confidence interval of 95%. SNPs that passed the quality control parameters were used to calculate the genomic inflation factor (λ) as well as to generate Quantile-Quantile (QQ) plots (
Control samples include both male and female samples in approximately equal proportions. The allele frequencies for the 8 strongly associated SNPs and the 15 SNPs with suggested associations did not show any significant gender bias.
Haplotype-based association tests were calculated by 1-degree of freedom χ2-test, along with their respective odds ratios using PLINK.
The variance explained by logistic regression model is calculated using the Cox Snell and Nagelkerke pseudo R2 method which is similar to the R2 concept of linear regression
IMPUTE2 (ver. 2.2.2) was used for imputing SNPs against the 1000-Genome (version 3 of the Phase 1 integrated data). Samples were pre-phased with IMPUTE2 using actual genotypes and then imputed for SNPs included in the 1000-Genome reference panel to form imputed haplotypes. Imputation was carried out within +/−250 kb of the main marker of interest. Only SNPs that pass the confidence score of > = 0.9 from imputation, call rate of 0.95 and with MAF>0.01 are reported. The imputation was performed on the total dataset of 2,019 cases and 14,471 control subjects.
PLINK (version 1.07;
QUANTO (version 1.2;
R (version 2.15.0;
Impute2 (version 2.2.2;
LocusZoom (version 1.1;
EIGENSTRAT (version 3.0;
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
a Severity of endometriosis is independent of the five top loci by logistic regression analysis. b Severity of endometriosis is independent of the top five loci by association analysis.
(PDF)
(PDF)
(PDF)
We would like to extend our sincere gratitude to all our study participants. It is because of them that this study has been possible. We also extend our thanks to our colleagues in Juneau’s Family Studies Group and our Laboratory for their untiring efforts on this endometriosis project.