Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium

  • Megan L. Grove ,

    Megan.L.Grove@uth.tmc.edu

    Affiliation School of Public Health, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America

  • Bing Yu,

    Affiliation School of Public Health, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America

  • Barbara J. Cochran,

    Affiliation School of Public Health, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America

  • Talin Haritunians,

    Affiliation Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States of America

  • Joshua C. Bis,

    Affiliation Cardiovascular Health Research Unit, University of Washington, Seattle, Washington, United States of America

  • Kent D. Taylor,

    Affiliation Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States of America

  • Mark Hansen,

    Affiliation Illumina, Inc., San Diego, California, United States of America

  • Ingrid B. Borecki,

    Affiliation Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America

  • L. Adrienne Cupples,

    Affiliations Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America, Framingham Heart Study of the National, Heart, Lung, and Blood Institute, Framingham, Massachusetts, United States of America

  • Myriam Fornage,

    Affiliation Institute of Molecular Medicine, Center for Human Genetics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America

  • Vilmundur Gudnason,

    Affiliations Icelandic Heart Association, Research Institute, Kopavogur, Iceland, Faculty of Medicine, University of Iceland, Reykjavík, Iceland

  • Tamara B. Harris,

    Affiliation Laboratory of Population Science, National Institute on Aging, Bethesda, Maryland, United States of America

  • Sekar Kathiresan,

    Affiliations Center for Human Genetic Research and Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Harvard Medical School, Boston, Massachusetts, United States of America, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America

  • Robert Kraaij,

    Affiliation Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands

  • Lenore J. Launer,

    Affiliation Laboratory of Population Science, National Institute on Aging, Bethesda, Maryland, United States of America

  • Daniel Levy,

    Affiliation Framingham Heart Study of the National, Heart, Lung, and Blood Institute, Framingham, Massachusetts, United States of America

  • Yongmei Liu,

    Affiliation Department of Epidemiology and Prevention, Wake Forest University School of Medicine, Winston-Salem, North Carolina, United States of America

  • Thomas Mosley,

    Affiliation Department of Medicine, University of Mississippi Medical Center, Jackson, Mississippi, United States of America

  • Gina M. Peloso,

    Affiliations Center for Human Genetic Research and Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts, United States of America, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America

  • Bruce M. Psaty,

    Affiliations Cardiovascular Health Research Unit, University of Washington, Seattle, Washington, United States of America, Department of Epidemiology, University of Washington, Seattle, Washington, United States of America, Department of Medicine, University of Washington, Seattle, Washington, United States of America, Department of Health Services, University of Washington, Seattle, Washington, United States of America, Group Health Research Institute, Seattle, Washington, United States of America

  • Stephen S. Rich,

    Affiliation Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America

  • Fernando Rivadeneira,

    Affiliations Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands, ErasmusAGE and Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands, Netherlands Consortium for Healthy Aging, Netherlands Genomics Initiative, Leiden, The Netherlands

  • David S. Siscovick,

    Affiliation Cardiovascular Health Research Unit, University of Washington, Seattle, Washington, United States of America

  • Albert V. Smith,

    Affiliations Icelandic Heart Association, Research Institute, Kopavogur, Iceland, Faculty of Medicine, University of Iceland, Reykjavík, Iceland

  • Andre Uitterlinden,

    Affiliations ErasmusAGE and Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands, Netherlands Consortium for Healthy Aging, Netherlands Genomics Initiative, Leiden, The Netherlands

  • Cornelia M. van Duijn,

    Affiliations ErasmusAGE and Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands, Netherlands Consortium for Healthy Aging, Netherlands Genomics Initiative, Leiden, The Netherlands

  • James G. Wilson,

    Affiliation Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, Mississippi, United States of America

  • Christopher J. O’Donnell,

    Affiliations Framingham Heart Study of the National, Heart, Lung, and Blood Institute, Framingham, Massachusetts, United States of America, Cardiology Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America

  • Jerome I. Rotter,

    Affiliation Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, California, United States of America

  •  [ ... ],
  • Eric Boerwinkle

    Affiliations School of Public Health, Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America, Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America

  • [ view all ]
  • [ view less ]

Abstract

Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip.

Introduction

Exome- and whole-genome sequencing is becoming increasingly affordable and allows for detection and genotyping of rare variants in the human genome. Yet, genotyping arrays remain a cost-effective approach when investigating genetic polymorphisms previously identified in large populations. A limitation of using arrays to genotype rare variants is the difficulty that automated clustering algorithms have to accurately detect and assign accurate genotype calls [1], [2]. Large sample sizes increase the number of occurrences of rare variants and, therefore, should facilitate automated clustering and genotyping.

An array focused on rare and low frequency coding variation, hereafter referred to as the exome chip, has been developed by querying the exomes sequenced in ∼12,000 individuals and aggregating the variation that is seen in more than two individuals in more than two sequencing efforts (http://genome.sph.umich.edu/wiki/Exome_Chip_Design). Participating studies in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium [3] consented to have their Illumina Infinium HumanExome BeadChip intensity data analyzed collectively (n = 62,266) in order to increase the accuracy of rare variant genotype calls. The resulting cluster file (.egt) is publically available and we show that its use, along with best practices, increase genotype accuracy compared to other methods alone.

Results

Genotypes were obtained for 238,876 successful variants in accordance with our best practices (96.4% SNP pass rate) which were converted to PLINK format [4] by cohort and combined into a single aggregate file for further analyses. Of the 62,266 samples genotyped, 1,380 (2.2%) had a GenCall quality score in the lower 10th percentile of the distribution across all variants genotyped (p10GC) <0.38 or call rate <0.97 and were excluded from allele frequency calculations. Because founder effects and unique population structure have been previously observed in Icelandic samples [5], [6], the Age, Gene/Environment, Susceptibility-Reykjavik study was excluded from subsequent steps. Known duplicated samples, individuals without self-reported race, and the HapMap controls were also removed. After excluding duplicate variants (n = 811), the minor allele frequencies (MAF) for 238,065 successful SNPs and 56,407 samples by self-reported race are described in Table 1. There were 10,693 monomorphic SNPs (4.5%), and 78.6% of the variants on the exome chip have a MAF <0.005. Allele frequencies for each variant by race group are reported in the SNP information file (see Methods and Data Access sections). Ethnicity specific HapMap allele frequencies for the 96 controls (48 CEU and 48 YRI) and genotypes are also available.

thumbnail
Table 1. Exome chip minor allele frequency distribution by race.

https://doi.org/10.1371/journal.pone.0068095.t001

To evaluate the performance of the rare variant calling approach (see Methods), we compared exome chip genotypes derived from three calling methods to available exome sequencing data in 530 ARIC individuals. First, exome chip genotypes were called with the Illumina issued cluster file HumanExome-12v1.egt (see Data Access section for file location) (Dataset I). Second, we used zCall (threshold set to 7) [7] to determine genotypes for the missing variant calls in Dataset I to create Dataset Z. Third, we used the CHARGE best practices (see Data Access) and joint calling approach described to ascertain exome chip genotypes (Dataset C). A total of 185,119 variants that were present in the exome sequence dataset and passed our best practices were compared using genotype concordance and uncertainty coefficient tests. Results are presented in Table 2. The uncertainty coefficients indicate that we can predict 86.4% of the information (entropy) in the exome sequence data when using the Illumina cluster file, 91.2% when using the zCall algorithm, and 93.4% when the CHARGE clustering method was utilized.

thumbnail
Table 2. Results of missing data, genotype discordance, uncertainty coefficients and frequencies of exome chip data ascertained by three calling methods and compared to exome sequence genotypes.

https://doi.org/10.1371/journal.pone.0068095.t002

These data demonstrate the importance of implementing stringent laboratory quality control measures in addition to the clustering algorithms and rare variant calling approaches tested. The complete list of 8,994 failing SNPs identified in the jointly called exome chip project are available for download on the CHARGE public website. Genotypes ascertained with the CHARGE jointly called exome chip cluster file (Dataset C) were 99.77% concordant with sequence data, 0.14% were missing in exome chip data, in the exome sequence data, or both, and 0.09% were discordant (Figure 1). Heterozygotes in Dataset C were most often misclassified when compared to the common allele homozygote, and mismatches were attributed equally to both sequencing and genotyping (Table 2).

thumbnail
Figure 1. Results of CHARGE exome chip genotype calls compared to exome sequence data in 530 individuals.

https://doi.org/10.1371/journal.pone.0068095.g001

We also tested the ability of the CHARGE exome chip cluster file to accurately assign genotypes in the three rarest variant bins: singletons (minor allele count = 1), doubletons (minor allele count = 2), and tripletons (minor allele count = 3). We observed high concordance between exome chip singletons (99.99%), doubletons (99.98%), and tripletons (99.97%) when compared to their respective sequence genotypes in the same 530 ARIC individuals previously described (data not shown). These results are consistent with the global concordance tests which suggest we are able to accurately call very rare variants.

Discussion

The results presented here demonstrate that rare variants on the exome chip can be accurately called when using a large, combined cluster file and best practices described when compared to existing clustering algorithms and rare variant calling methods. The joint calling protocol, accompanying cluster file, list of poor performing variants on the chip, and annotation data are a valuable resource for the scientific community and will be of great utility to those having smaller sample sets where the calling of rare variants is problematic. All new projects will require user decisions based on their own cohort data and the metrics and best practices presented here should be updated accordingly.

Materials and Methods

Subjects

Data from 62,266 participants from the following eleven studies in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium [3] were included in this joint calling experiment and study descriptions were published previously: Age, Gene/Environment, Susceptibility-Reykjavik (AGES) Study [8], Atherosclerosis Risk in Communities (ARIC) Study [9], Cardiac Arrest Blood Study (CABS) [10], Cardiovascular Health Study (CHS) [11], [12], Coronary Artery Risk Development in Young Adults (CARDIA) [13], [14], Multi-Ethnic Study of Atherosclerosis (MESA) [15], Family Heart Study (FamHS) [16], Framingham Heart Study (FHS) [17], Health, Aging, and Body Composition (HABC) Study [18], Jackson Heart Study (JHS) [19], and the Rotterdam Study (RS) [20][23]. In addition, we genotyped 96 unrelated HapMap samples (48 CEU and 48 YRI) with each cohort and the list of sample IDs are available as a reference on the CHARGE exome chip public website.

Ethics Statement

All subjects provided written and informed consent to participate in genetic studies, and all study sites received approval to conduct this research from their local respective Institutional Review Boards (IRB) as follows: „The National Bioethics Committee“ and „The Data Protection Authority“ (AGES); University of Mississippi Medical Center IRB (ARIC – Jackson Field Center), Wake Forest University Health Sciences IRB (ARIC – Forsyth County Field Center), University of Minnesota IRB (ARIC – Minnesota Field Center), and Johns Hopkins University (Bloomberg School of Public Health) IRB (ARIC – Washington County Field Center); University of Washington IRB (CABS); Wake Forest University Health Sciences IRB (CHS – Forsyth County Field Center), University of California, Davis IRB (CHS – Sacramento County Field Center), Johns Hopkins University (Bloomberg School of Public Health) IRB (CHS – Washington County Field Center), and University of Pittsburgh IRB (CHS – Pittsburgh Field Center); University of Alabama at Birmingham (CARDIA – Birmingham Field Center), Northwestern University IRB (CARDIA – Chicago Field Center), University of Minnesota IRB (CARDIA – Minneapolis Field Center), and Kaiser Permanente IRB (CARDIA – Oakland Field Center); Washington University IRB (FamHS); Boston University IRB (FHS); Wake Forest University Health Sciences IRB (HABC); University of Mississippi Medical Center IRB (JHS); Columbia University IRB (MESA – New York Field Center), Johns Hopkins University IRB (MESA – Baltimore Field Center), Northwestern University IRB (MESA – Chicago Field Center), University of California IRB (MESA – Los Angeles Field Center), University of Minnesota IRB (MESA – Twin Cities Field Center), Wake Forest University Health Sciences IRB (MESA – Winston-Salem Field Center) and the National Heart, Lung, and Blood Institute; Medisch Ethische Toetsings Commissie (METC) at the Erasmus Medical Center, and the Netherlands Ministry of Health, Welfare and Sport (VWS) (RS). Joint calling of the array data was approved by the Committee for the Protection of Human Subjects (CPHS) which serves as the IRB for the University of Texas Health Science Center at Houston.

Genotyping

Study samples were processed on the HumanExome BeadChip v1.0 (Illumina, Inc., San Diego, CA) querying 247,870 variable sites described elsewhere (see Data Access) using standard protocols suggested by the manufacturer at the following seven genotyping centers: Broad Institute (JHS), Cedars-Sinai Medical Center (CHS, FamHS and MESA), Erasmus Medical Center (RS), Illumina Fast Track Services (FHS), University of Texas Health Science Center at Houston (AGES, ARIC and CARDIA), University of Washington (CABS), and Wake Forest University (HABC). Each center genotyped a common set of 96 HapMap samples to be utilized for quality control and determination of batch effects. The two channel raw data files (.idat) for all samples were transferred to a central location and assembled into a single project for joint calling. A summary of the samples genotyped within each cohort by race and gender is described in Table 3. The following variables were provided for each sample included in the project: study specific sample ID, cohort name, sample type (DNA or WGA), race (self-reported), gender, sample plate, sample well, chip barcode, chip position, replicate ID, father and mother IDs (if applicable).

thumbnail
Table 3. Sample sizes of cohorts participating in joint calling effort by gender and self-reported race.

https://doi.org/10.1371/journal.pone.0068095.t003

Clustering, Genotype Calling and Laboratory Quality Control

The Illumina GenomeStudio v2011.1 software was utilized with the GenTrain 2.0 clustering algorithm. Genomic DNA study samples and HapMap controls with call rates >99% (n = 55,142) were used to define genotype clusters with races combined and reruns excluded. The no-call threshold was set to 0.15 and we excluded female Y SNPs when calculating SNP statistics. The genotype quality score, representing the 10th percentile of the distribution of GenCall scores across all SNPs genotyped (p10GC), was visually examined in a scatter plot across all samples (Index vs. p10GC). Samples with an empirically determined p10GC <0.38 were identified as outliers and flagged for exclusion. The SNP parameters “Expected Number of Clusters of Y SNPs” and “Expected Number of Clusters of mtSNPs” were set to 2. Following automated clustering, all variants meeting the criteria provided in Table 4 (n = 107,175) were visually inspected and manually clustered, if possible, by two independent laboratory technicians. AA and BB theta deviation cutoffs were determined empirically. Variants removed from the HumanExome BeadChip v1.1 (n = 4,969) and cautious sites, as defined by the exome chip design committee (n = 333) (ftp://share.sph.umich.edu/exomeChip/IlluminaDesigns/cautiousSites/cautiousSite.sorted.sites), were also inspected. Samples with a call rate between 0.95 and 0.99 that had been previously excluded were brought back in to the project and re-inspected based on the criteria listed in Table 4. This additional review was necessary as the CHARGE exome chip project contains samples from multiple DNA sources and ethnicities that were genotyped at several centers. SNPs exhibiting obvious batch effects were excluded. After joint calling, reproducibility and heritability statistics, SNP statistics and sample statistics were updated and the SNP-level quality control criteria described in Table 5 were implemented. SNPs with reproducibility (rep) errors >2, parent-parent-child (PPC) error >1, or parent-child (PC) error >1 were not excluded, but were flagged and reported back to the participating studies for further investigation. A list of the 8,994 excluded variants is provided on the CHARGE exome chip website as cluster positions for these sites are zeroed out in the.egt file (note: all SNP statistics for these sites will be converted to zero when the cluster file is imported into the Genome Studio project). A portion of excluded SNPs may be recoverable in projects with a homogenous population substructure, and we recommend clustering and reviewing the subset of variants with the user’s high quality samples. Table 6 describes the exome chip content and number of variants excluded by functional category (see Annotation). Importantly, the v1.0 cluster file should not be used for calling the Illumina v1.1 exome chip as the two versions were manufactured with different bead pools.

thumbnail
Table 4. Best practices criteria used to identify SNPs for visual inspection and manual reclustering.

https://doi.org/10.1371/journal.pone.0068095.t004

thumbnail
Table 6. Exome chip content and CHARGE excluded variants by functional category.

https://doi.org/10.1371/journal.pone.0068095.t006

Exome Chip Performance

Genotypes derived from available exome sequencing of 540 ARIC participants were used as the comparison dataset to test the performance of the exome chip. We excluded 10 individuals from the sequencing dataset due to a high missing data rate <0.90, or non-overlap of individuals with existing exome chip data. Exome sequencing data is accessible via dbGaP as part of the National Heart Lung and Blood Institute (NHLBI) GO-ESP: Heart Cohorts Component of the Exome Sequencing Project (ARIC) (Study Accession: phs000398.v1.p1).

The following variants were excluded from the exome chip dataset as they were not available in the genotype data derived from exome sequencing results: replicate sites that were determined as triallelic or duplicates on opposite strands, short insertion/deletions, XY chromosome SNPs, Y chromosome SNPs, mitochondrial SNPs, or sites not identified in the exome sequencing dataset (n = 56,042). Poor performing variants identified by our best practices criteria were removed if not previously excluded (n = 6,709), thus a total of 185,119 variants were available for concordance analyses in 530 individuals.

Since concordance results are potentially high due to rare variation on the exome chip, we also calculated uncertainty coefficients [24] to determine the degree of association between each of the exome chip calling methods and exome sequence data. The uncertainty coefficient is a measure of association that is based on information entropy [25], or the uncertainty in a random variable, that is, a variable subject to chance variations. Uncertainty coefficients are useful when evaluating results obtained from clustering algorithms since genotype classification is usually random (all minor alleles are not classified as either AA or BB), thus the algorithm is not susceptible to rare variation bias in which the more common genotype could have been called by chance alone. See Press et al. (1992), pp. 758–762, for further clarification of the uncertainty coefficient metric [26].

Annotation

Annotation of the v1.0 exome chip variants was performed with dbNSFP [27]. The dbNSFP v2.0 annotations are available on the CHARGE exome chip public website in the SNP information file. dbSNP rs information has been curated and a look up table with the associated Illumina SNP name is also available. The reason for inclusion of the variant on the exome chip by the design team is also provided in the SNP info file (ftp://share.sph.umich.edu/exomeChip/IlluminaDesigns/annotatedList.txt).

Data Access

The following CHARGE supporting documents are located at chargeconsortium.com/main/exomechip: CHARGE_ExomeChip_Best_Practices.pdf, CHARGE_ExomeChip_v1.0_Cluster_File.egt (cluster file for v1.0 chip), CHARGE_ExomeChip_v1.0_Excluded_Variants.txt (list of 8,994 zeroed out variants in cluster file), CHARGE_ExomeChip_SNP_Info_File.tsv.txt and Read Me file includes Illumina annotation, dbNSFP annotation, dbSNP rs numbers, overlapping sites between the HumanExome BeadChip v1.0 and v1.1, reason for inclusion, and race specific allele frequencies for each variant, including HapMap controls. Sample identifiers (CHARGE_ExomeChip_HapMap96_Control_List.csv) and genotypes for the 96 unrelated HapMap controls (CHARGE_ExomeChip_HapMap96_Genotype_Data.csv) are also available.

The Illumina genotyping protocol (Infinium_Best_Practices_370-2009-010.pdf) and cluster file (HumanExome-12v1.egt) are available with a MyIllumina login at https://icom.illumina.com/. The exome chip content data sheet is publicly available at http://www.illumina.com/documents/products/datasheets/datasheet_humanexome_beadchips.pdf.

zCall is a rare variant caller for array-based genotyping provided by Goldstein et al. and available for download at github.com/jigold/zCall [7]. PLINK is a freely available analysis toolset at http://pngu.mgh.harvard.edu/purcell/plink/ [4].

Acknowledgments

We thank all participating cohorts and institutions for their collaboration in this large-scale effort, and acknowledge the important role of the CHARGE (Cohorts for Heart and Aging Research in Genome Epidemiology) consortium in the development and support of this manuscript. We would also like to recognize the following individuals at UT for their expertise and participation in the genotype calling, data management and analyses, respectively: Irina Strelets, Genesis Williams, and Kim Lawson. Also, we are thankful to Jennifer Brody at the University of Washington for her contributions to the exome chip supporting documentation.

The CHARGE investigators request that publications resulting from these exome chip data also cite their original publication: Psaty BM, O’Donnell CJ, Gudnason V, Lunetta KL, Folsom AR, Rotter JI, Uitterlinden AG, Harris TB, Witteman JC, Boerwinkle E; CHARGE Consortium. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: Design of prospective meta-analyses of genome-wide association studies from five cohorts. Circ Cardiovasc Genet 2∶73-80, 2009.

Author Contributions

Conceived and designed the experiments: JCB KDT IBB LAC MF VG SK DL YL BMP SSR DSS AVS AU CMVD JGW CJO JIR EB. Performed the experiments: MLG BJC TH MH YL RK FR. Analyzed the data: MLG BY EB. Contributed reagents/materials/analysis tools: KDT IBB MF VG SK YL GMP BMP SSR DSS AVS AU CMVD JGW CJO JIR EB. Wrote the paper: MLG BY BJC TH JCB KDT MH IBB LAC MF VG TBH SK RK LJL DL YL TM GMP BMP SSR FR DSS AVS AU CMVD JGW CJO JIR EB.

References

  1. 1. Korn JM, Kuruvilla FG, McCarroll SA, Wysoker A, Nemesh J, et al. (2008) Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40: 1253–1260.
  2. 2. Ritchie ME, Liu R, Carvalho BS, Irizarry RA (2011) Comparing genotyping algorithms for Illumina’s Infinium whole-genome SNP BeadChips. BMC Bioinformatics 12: 68.
  3. 3. Psaty BM, O’Donnell CJ, Gudnason V, Lunetta KL, Folsom AR, et al. (2009) Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ Cardiovasc Genet 2: 73–80.
  4. 4. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
  5. 5. Gudnason V, Sigurdsson G, Nissen H, Humphries SE (1997) Common founder mutation in the LDL receptor gene causing familial hypercholesterolaemia in the Icelandic population. Hum Mutat 10: 36–44.
  6. 6. Thorlacius S, Olafsdottir G, Tryggvadottir L, Neuhausen S, Jonasson JG, et al. (1996) A single BRCA2 mutation in male and female breast cancer families from Iceland with varied cancer phenotypes. Nat Genet 13: 117–119.
  7. 7. Goldstein JI, Crenshaw A, Carey J, Grant G, Maguire J, et al. (2012) zCall: a rare variant caller for array-based genotyping. Bioinformatics 28: 2543–2545.
  8. 8. Harris TB, Launer LJ, Eiriksdottir G, Kjartansson O, Jonsson PV, et al. (2007) Age, Gene/Environment Susceptibility-Reykjavik Study: multidisciplinary applied phenomics. Am J Epidemiol 165: 1076–1087.
  9. 9. The ARIC Investigators (1989) The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. Am J Epidemiol 129: 687–702.
  10. 10. Siscovick DS, Raghunathan TE, King I, Weinmann S, Wicklund KG, et al. (1995) Dietary intake and cell membrane levels of long-chain n-3 polyunsaturated fatty acids and the risk of primary cardiac arrest. JAMA 274: 1363–1367.
  11. 11. Fried LP, Borhani NO, Enright P, Furberg CD, Gardin JM, et al. (1991) The Cardiovascular Health Study: design and rationale. Ann Epidemiol 1: 263–276.
  12. 12. Tell GS, Fried LP, Hermanson B, Manolio TA, Newman AB, et al. (1993) Recruitment of adults 65 years and older as participants in the Cardiovascular Health Study. Ann Epidemiol 3: 358–366.
  13. 13. Friedman GD, Cutter GR, Donahue RP, Hughes GH, Hulley SB, et al. (1988) CARDIA: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol 41: 1105–1116.
  14. 14. Cutter GR, Burke GL, Dyer AR, Friedman GD, Hilner JE, et al. (1991) Cardiovascular risk factors in young adults. The CARDIA baseline monograph. Control Clin Trials 12: 1S–77S.
  15. 15. Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, et al. (2002) Multi-ethnic study of atherosclerosis: objectives and design. Am J Epidemiol 156: 871–881.
  16. 16. Higgins M, Province M, Heiss G, Eckfeldt J, Ellison RC, et al. (1996) NHLBI Family Heart Study: objectives and design. Am J Epidemiol 143: 1219–1228.
  17. 17. Dawber TR, Meadors GF, Moore FE Jr (1951) Epidemiological approaches to heart disease: the Framingham Study. Am J Public Health Nations Health 41: 279–281.
  18. 18. Park SW, Goodpaster BH, Strotmeyer ES, de Rekeneire N, Harris TB, et al. (2006) Decreased muscle strength and quality in older adults with type 2 diabetes: the Health, Aging, and Body Composition Study. Diabetes 55: 1813–1818.
  19. 19. Taylor HA Jr, Wilson JG, Jones DW, Sarpong DF, Srinivasan A, et al. (2005) Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethnic Dis 15: S6–4-17.
  20. 20. Hofman A, Grobbee DE, de Jong PTVM, van den Ouweland FA (1991) Determinants of disease and disability in the elderly: the Rotterdam Study. Eur J Epidemiol 1991: 403–422.
  21. 21. Hofman A, Breteler MMB, van Duijn CM, Krestin GP, Pols HA, et al. (2007) The Rotterdam Study: objectives and design update. Eur J Epidemiol 22: 819–829.
  22. 22. Hofman A, Breteler MMB, van Duijn CM, Janssen HL, Krestin GP, et al. (2009) The Rotterdam Study: 2010 objectives and design update. Eur J Epidemiol 24: 553–572.
  23. 23. Hofman A, van Duijn CM, Franco OH, Ikram MA, Janssen HL, et al. (2011) The Rotterdam Study: 2012 objectives and design update. Eur J Epidemiol 26: 657–686.
  24. 24. Mills P (2011) Efficient statistical classification of satellite measurements. Int J Remote Sens 32: 6109–6132.
  25. 25. Cover TM, Thomas JA (2006) Elements of Information Theory, 2nd Edition. Hoboken, NJ: John Wiley & Sons, Inc. 748 p.
  26. 26. Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1992) Numerical Recipes: the Art of Scientific Computing, 3rd Edition. New York, NY: Cambridge University Press. 1235 p.
  27. 27. Liu X, Jian X, Boerwinkle E (2011) dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 32: 894–899.