Conceived and designed the experiments: NP AD KNH MS JFG JL JBW RHM TF ALD. Performed the experiments: NP AD KNH MS KFD JFG WCN ALD. Analyzed the data: NP AD JL JBW KNH ALD. Contributed reagents/materials/analysis tools: KNH KFD CH WCN JFG. Wrote the paper: NP AD KNH MS JL JBW CH KFD JFG WCN RHM TF ALD. Take responsibility for the PSG-PROGENI Investigators, Coordinators and Molecular Genetic Laboratories: NP WCN TF. Take responsibility for the GenePD Investigators, Coordinators and Molecular Genetic Laboratories: JFG RHM ALD.
The authors have declared that no competing interests exist.
Copy number variants (CNVs) are known to cause Mendelian forms of Parkinson disease (PD), most notably in
Copy number variants (CNVs), defined as structural changes in DNA consisting of deletions or duplications of segments larger than 1 kb compared to a reference genome
Parkinson disease (PD) is the second most common neurodegenerative disorder affecting approximately 500,000 Americans. The genetic etiology of PD is complex, although mutations in five genes have been identified that lead to either an autosomal dominant or an autosomal recessive form of the disease
Mutations in
In the present study, we replicate in an independent dataset the association of a single dosage mutation in
The final sample included 816 cases and 856 controls (
PD Cases (n = 816) | Controls | ||
PROGENI | GenePD | NINDS Coriell Repository | |
(n = 486) | (n = 330) | (n = 856) | |
Average age at onset (cases) or at enrollment (controls) | 62.1±10.4 | 61.4±11.6 | 54.8±13.1 |
% Male | 60.3% | 57.8% | 40.1% |
% with parent reported to have PD | 36.0% | 23.9% | 0% |
The Illumina HumanCNV370 array provided intensity data for 370,404 probes. Illumina's BeadStudio software transformed the signal intensity data for these probes into Log R ratio (LRR) and B allele frequency (BAF) measures that could be used to generate CNV calls by PennCNV
A. A good marker, with three distinct clusters and males and females equally distributed in each cluster; B. Complete co-hybridization to sex chromosome, where all females are called as homozygotes and all males are called as heterozygotes; C. A monomorphic marker (i.e. a CNV probe) exhibiting partial co-hybridization to a sex chromosome, such that individuals cluster by gender, but mean Log R ratios do not differ by gender (p = 0.49); D. Polymorphic SNP exhibiting partial hybridization to a sex chromosome, where multiple distinct groups separated by gender.
Whole chromosomal arm mosaicism was detected by analyzing the BAF distribution of each chromosomal arm of each individual. There were 44 chromosomal arms from 35 samples (13 cases, 22 controls) that were flagged as outliers and demonstrated evidence of mosaicism (
Sample A harbors a large duplication with the region indicated by a blue bar (average Log R ratio is increased, and B allele frequency (BAF; proportion of alleles estimated to be the B allele) match the 4 expected proportions of 0.0 = AAA, 0.33 = AAB, 0.66 = ABB, 1.0 = BBB). Sample B harbors a large deletion with the region indicated by a purple bar (decreased Log R ratio, with no heterozygotes (BAF = 0.50)); however, since BAF are not limited to values of 0 and 1, the deletion appears to be mosaic.
Gray dot = normal Log R ratio (LRR) and B allele frequency (BAF) distributions; Green dot = mosaic pattern with a significant enough deviation in mean LRR to be called as a CNV – such chromosomal arms were removed from analyses; Purple dot = mosaic pattern without significant LRR deviation – the CNV calling algorithms did not call these as CNVs; Blue dot = faint mosaics – not called; Red dot = multiple distinct mosaicism events – chromosomal arms removed from analyses; Black dot = normal LRR and BAF distributions – the reason they are outliers is unknown; Pink dot = no mosaicism pattern, but very noisy – all of these samples had already been flagged as having unacceptably high LRR standard deviations and had already been removed from analyses.
Previously, an increased risk of PD was found for those having a single
Of the ten cases in which CNVs were identified, two had already been molecularly tested for
PennCNV called 22,685 CNVs that had a confidence value of 10 or higher and that spanned at least 5 SNPs. Of these, 2,195 (9.8%) met the Conservative criteria (≥20 probes, ≥100 kb) and 20,073 (88.5%) met the Common criteria (≥5 probes and contain at least one SNP/CNV probe that was observed to be deleted or duplicated in our dataset 3 or more times). The intersection of the two filters contained 1,883 CNVs and the union contained 20,385 CNVs. There were 312 calls meeting the Conservative criteria that were restricted to regions where only one or two individuals contained variants and thus failed the Common criteria. There were 2,300 CNV calls (10.1%) that did not meet the criteria for either the Conservative or Common approach and were not analyzed in the union set. Of these, 1,568 contained 5–9 markers (avg. size 36 kb), 498 contained 10–14 markers (avg. size 71 kb), 199 contained 15–19 markers (avg. size 110 kb), and 35 with more than 20 markers (but less than 100 kb; avg. size 79 kb). The Gene-centric approach, which takes the union of the two other approaches and identifies the subset that overlaps a portion of at least one RefSeq gene, contained 8,746 CNVs or 42.9% of those possible.
Genome-wide analyses of locus-specific CNV associations were performed using CNV calls from two different algorithms (PennCNV and QuantiSNP) and using two different methods (position and 400 kb windows). Multiple comparisons were corrected within each analysis via permutation testing. To control for multiple testing across approaches, a conservative (given the correlation between the various permutations) Bonferroni correction of 0.0125 was applied to reach study-wide significance.
PennCNV showed a trend at the
PennCNV | QuantiSNP | |||||
Location | Gene | Test |
Union | Gene-centric | Union | Gene-centric |
chr1:173049146–173078950 | 85 kb from | P | 0.17 | 0.10 | 1.00 | 1.00 |
|
W | 0.13 | 0.07 | 1.00 | 1.00 | |
chr4:71528873–71716513 | overlapping | P | 1.00 | 1.00 | 0.17 | 0.12 |
|
W | 1.00 | 1.00 | 0.02 | 0.01 | |
chr5:151389412–151513092 | 105 kb from | P | 0.02 | 1.00 | 0.65 | 1.00 |
|
W | 0.02 | 1.00 | 0.39 | 1.00 | |
chr6:162471089–162677104 | within | P | 0.83 | 0.64 | 0.91 | 0.79 |
|
W | 0.04 | 0.02 | 0.13 | 0.08 | |
chr8:7575048–7575048 | gene | P | 1.00 | 1.00 | 1.00 | 1.00 |
desert | W | 0.23 | 0.13 | 1.00 | 0.99 | |
chr8:25038472–25171648 | within | P |
|
|
|
|
|
W |
|
|
|
|
|
chr11:84948993–84958207 | within | P | 1.00 | 1.00 | 0.04 | 0.03 |
|
W | 0.20 | 0.11 | 0.02 | 0.009 | |
chr17:55581582–55809920 | within | P |
|
|
|
|
|
W |
|
|
|
|
Those p-values in
Tests were either based on a specific position (P) or based on a 400 kb Window (W).
All CNV calls for the
We also used MLPA as an alternate validation method. Six samples with and fifteen samples without a PennCNV deletion call in the
A two color overlay shows a representation of the capillary electrophoresis peak profiles from one sample with a PennCNV deletion call (shown in red) and one sample without a PennCNV deletion call (shown green) in the
Analysis of the underlying sequence of the
This image of 16 samples is representative of all 95 samples run. PennCNV calls are listed for each sample (normal, deletion, duplication). The mean Log R ratio for the 6 monomorphic CNV probes in
Despite the increase in the study of CNVs as a potential disease risk factor, there is still no consensus on the best approach for the detection or analysis of CNVs. A prior genome-wide study of CNV in Parkinson disease using a relatively small sample (273 cases and 275 controls) and visual inspection of LRR and BAF to identify CNVs, found CNVs within
Here, we present the results of the first systematic genome-wide analysis of CNVs for PD using CNV calling algorithms. We replicated the association of PD susceptibility with
The
The finding of
Duplications and triplications of
Both of the CNV calling algorithms generated CNVs in
As described above, multiple filters were used to improve the quality of CNV calls. In retrospect, the Conservative approach (>100 kb and ≥20 markers) had the lowest false positive rate, since it did not flag
Strengths of this study include the exclusive use of familial PD, which is likely to have a greater genetic contribution and, therefore, greater power to detect association than idiopathic PD. In addition, careful quality assessment was performed for the samples analyzed for CNVs, the markers used in the CNV calling algorithms, and the filters applied to the CNVs that were called. Limitations of the study include the relatively sparse marker set on the Illumina 370Duo compared to newer arrays and the stratification of DNA source by affection status.
In summary, we have detected association of PD with CNVs in
A genome-wide case control association design was employed to identify genes contributing to PD susceptibility
Genotyping was performed by the Center for Inherited Disease Research (CIDR) using the Illumina HumanCNV370 version1_C BeadChips (Illumina, San Diego, CA, USA) and the Illumina Infinium II assay protocol
All cases were known to be negative for the
Due to the limitations of CNV calling algorithms, markers within regions of known instability (telomeres, centromeres and immunoglobulin regions, boundaries previously delineated by Need et al.
CNVs that span the entire arm of a chromosome have been detected. Typically, these are the result of somatic loss or gain and often exhibit a mosaic pattern, with some cells containing a normal karyotype. The loss of an entire chromosomal arm is frequently an artifact of the lymphoblast immortalization process, which is relevant since all controls were from LCLs, and all the cases were from whole blood; however, in our sample, mosaicism was frequently seen in DNA derived from whole blood. To detect whole chromosomal arm mosaicism, each arm of each chromosome of each individual was analyzed separately. As described in
There is currently no consensus regarding the best algorithm to call CNVs. Therefore, two frequently used CNV calling algorithms were employed, PennCNV
Two complementary filtering approaches were applied to minimize false positive CNV calls. The first approach, which we refer to as Conservative, focused on large CNVs that were greater than 100 kb and spanned at least 20 markers. Similar to previous studies
We then performed secondary analyses that limited our analyses to those CNVs that overlapped a portion of at least one RefSeq gene (Gene-centric approach). We did not require the CNVs to specifically overlap an exon, since deletions molecularly confirmed to span an exon could be called by PennCNV as having boundaries that are exclusively intronic due to the relatively sparse marker set employed in the current study.
Those samples that did not overlap with the previous study of
To test the hypothesis that particular CNVs would be found at increased frequency in PD cases as compared with controls (one-sided Fisher's exact test with significance determined via permutation), we performed two analyses using PLINK
We sought to molecularly validate large statistically significant deletions and duplications identified in
We used MLPA as a second approach to molecularly validate the inferred large CNVs. A custom assay with 11 probes was designed to capture exons 2, 3, 5, 12, and 13 of
We also sought to molecularly validate the length of the 32 bp repeats identified in
(TIF)
(DOC)
Comparison of genome-wide results across CNV filters for regions with an empirical genome-wide p-value <0.20 for any test.
(DOC)
(DOC)
We particularly thank Justin Paschall from the NCBI dbGaP staff for his assistance in developing the dataset available at dbGaP. The data generated from this case control study are available at
The following are members of the PROGENI Steering Committee. University of Tennessee Health Science Center: R. F. Pfeiffer; University of Rochester: F. Marshall, D. Oakes, A. Rudolph, A. Shinaman; Columbia University Medical Center: K. Marder; Indiana University School of Medicine: P.M. Conneally, T. Foroud, C. Halter; University of Kansas Medical Center: K. Lyons; Eli Lilly & Company: E. Siemers; Medical College of Ohio: L. Elmers; University of California, Irvine: N. Hermanowicz.
The following are members of the GenePD Steering Committee. University of Virginia Health System: G.F. Wooten; UMDNJ-Robert Wood Johnson Medical School: L. Golbe; Center for Human Genetic Research, Massachusetts General Hospital, Harvard Medical School: J.F. Gusella; Boston University School of Medicine: R.H. Myers.
We thank the subjects for their participation in this research study.