Conceived and designed the experiments: GK NT PC JPAI. Performed the experiments: GK NT PC JPAI. Analyzed the data: GK NT PC JPAI. Contributed reagents/materials/analysis tools: GK NT PC JPAI. Wrote the paper: GK NT PC JPAI.
Georgios D. Kitsios was a recipient of a Pfizer/Tufts Medical Center career development award. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
The agnostic screening performed by genome-wide association studies (GWAS) has uncovered associations for previously unsuspected genes. Knowledge about the functional role of these genes is crucial and laboratory mouse models can provide such information. Here, we describe a systematic juxtaposition of human GWAS-discovered loci versus mouse models in order to appreciate the availability of mouse models data, to gain biological insights for the role of these genes and to explore the extent of concordance between these two lines of evidence. We perused publicly available data (NHGRI database for human associations and Mouse Genome Informatics database for mouse models) and employed two alternative approaches for cross-species comparisons, phenotype- and gene-centric. A total of 293 single gene-phenotype human associations (262 unique genes and 69 unique phenotypes) were evaluated. In the phenotype-centric approach, we identified all mouse models and related ortholog genes for the 51 human phenotypes with a comparable phenotype in mice. A total of 27 ortholog genes were found to be associated with the same phenotype in humans and mice, a concordance that was significantly larger than expected by chance (p<0.001). In the gene-centric approach, we were able to locate at least 1 knockout model for 60% of the 262 genes. The knockouts for 35% of these orthologs displayed pre- or post-natal lethality. For the remaining non-lethal orthologs, the same organ system was involved in mice and humans in 71% of the cases (p<0.001). Our project highlights the wealth of available information from mouse models for human GWAS, catalogues extensive information on plausible physiologic implications for many genes, provides hypothesis-generating findings for additional GWAS analyses and documents that the concordance between human and mouse genetic association is larger than expected by chance and can be informative.
Genome-wide association studies (GWAS) have led to the discovery of hundreds of associations between genetic loci and complex human diseases or traits
One of the most extensive and readily available sources of such evidence is provided by mouse model organisms. The mouse has a fully sequenced genome, almost all (99%) mouse genes have orthologs in humans, and multiple tools are available for manipulating its genome, allowing genes to be altered efficiently and precisely. Knowledge gained from mouse models can facilitate biomedical discoveries, by uncovering the functional role of genes and enabling cross-species comparisons. Currently, the Mouse Genome Informatics (MGI) database represents the most comprehensive public resource providing integrated access to genetic and phenotypic information for thousands of curated mouse mutations
Some investigators have performed focused comparisons between gene-disease associations emerging from GWAS and the mouse phenotypes observed when the respective gene loci are knocked out
We used the NHGRI catalogue of GWAS, a comprehensive database of all published GWAS
Of the remaining, streamlined set of GWAS-discovered associations, we selected those where only one gene had been implicated and excluded those associations that mapped to loci with multiple potentially implicated genes. In this selection, we followed the arbitration of the GWAS authors and the curators of the NHGRI catalogue. When a single gene is listed, this does not mean that necessarily this gene is the culprit one, but the investigators of the GWAS and the NHGRI curators considered that the identified SNP is located in this specific gene and therefore this gene is more likely to be the culprit than neighbouring or distant contesters. For each one of the eligible genes, we recorded the investigated human phenotypes and the individual GWAS results, as provided by the NHGRI.
All necessary information on mouse models was extracted from the MGI database (
In order to search comprehensively for laboratory mouse models for all eligible human gene-disease associations, we applied two alternative and independent approaches: a phenotype-centric, where our search sample was defined by the human phenotypes studied in GWAS, and a gene-centric approach, where the search sample was formed by the GWAS-implicated genes (
MPO: Mammalian Phenotype Ontology.
In this approach, we mapped the human phenotypes associated with GWAS-discovered loci to their corresponding mouse phenotypes, and we assembled a comprehensive list of mouse genes associated with these phenotypes. Then, we evaluated the extent of overlap between the mouse and human orthologs that have been associated with the same phenotype (
Previous work has created hierarchical systems of human heritable phenotypes and has integrated phenotype ontologies across species, including humans and mice
In the next step, we performed systematic searches in the ‘Alleles and Phenotypes’ reports of the MGI Data and Statistical Reports for all mouse models (considering all types of mutations, apart from gene trapped markers that had been studied only in cell lines and not in living organisms) associated with each MP accession number in order to identify all mouse genes that have been associated with each particular MP. For phenotypes with descendant nodes, we also included the MP accession numbers of the descendant nodes in our searches. Finally, for each phenotype, we compared the associated human and mouse genes and we recorded all instances where the same orthologs were associated with the same phenotype in both species (“concordant orthologs”).
With this second approach, our search started from the orthologs of the GWAS-derived human genes, for which we identified all knockout models and evaluated their phenotypic expression (
For each ortholog of the GWAS-derived human genes, we recorded the availability of knockout models, and we catalogued all available information on observed phenotypes in these mouse models. The observed knockout mouse phenotypes were categorized according to the anatomical system(s) affected, as described above. In cases where a knockout model displayed lethality, this was noted separately. For knockout models not expressing lethality, we explored instances of phenotypic concordance (at the level of affected anatomical system) between human gene associations and corresponding knockouts.
For both the phenotype- and the gene-centric approach, we compared whether the observed concordance between human and mice data was significantly different from the expected concordance by chance. The expected concordance by chance was calculated by considering the marginal and grand totals of 2×2 tables with juxtaposed human and mice data.
In the phenotypic-centric approach, the expected concordance for a given phenotype X was calculated as: [(number of mouse genes associated with X) * (number of human genes associated with X)/(total number of ortholog pairs with available mouse models)]. The denominator (grand total) is approximated by the total number of orthologous genes that have been studied in a laboratory mouse model and thus have a chance to be associated with the same phenotype in humans and mice. According to MGI 4.33, the denominator was set to be equal to 12,526. Then, the expected concordances of all phenotypes were summed up and compared to the number of overall observed concordances.
In the gene-centric approach, comparisons of phenotypic concordance were performed at the level of the anatomical system affected. Consequently, the expected concordance for a given gene Y was calculated as: [(number of anatomical systems associated with Y in mice) * (number of anatomical systems associated with Y in humans)/(total number of anatomical systems affected in mice and humans]. The total number of anatomical systems equals 31. Then, the expected concordances of all genes were summed up and compared to the number of overall observed concordances using a chi-square test with 1 degree of freedom.
A flowchart of the selection process of GWAS associations eligible for comparisons with knockout mice models is provided in
Of the 69 phenotypes investigated in humans, we reached consensus on a final list of 51 phenotypes that were considered to have a mammalian equivalent phenotype (
Each mammalian phenotype has been associated with a median of 21 mouse models (interquartile range (IQR), 15–61 models), corresponding to a median of 17 different genes (IQR, 4–36) per phenotype. In the corresponding human phenotypes, a median of 3 genes (IQR, 1–6) per phenotype was implicated in GWAS.
When comparing the orthologs involved in the human and the mammalian phenotypes, 27 concordant orthologs were found in 10 phenotypes (
Although human GWAS associations have been documented in agnostic experiments, the mouse models are typically constructed to test a specific hypothesis, which is usually based on various types of biological evidence. Consequently, the creation of certain mouse models may have been informed by human genetic associations that had already been recognized in the candidate gene era before the advent of GWAS. In order to control for this, we classified the human GWAS associations into novel ones and associations proposed by candidate-gene studies (
Our human GWAS sample of 293 gene-disease associations involved a total of 262 unique genes, since 15 genes were associated with more than one phenotype. Orthologs were identified for 250 (95%) of them. We subsequently searched for knockout mouse models constructed for these orthologs and we were able to locate at least 1 knockout model for 150 of the 250 orthologs (60%); 73 of these orthologs had more than one knockout model available (range 2–11). Overall, 295 knockout models for the 150 orthologs were found in the MGI database, with variable types of gene deletion techniques, genetic backgrounds and phenotypic information for various allelic combinations (heterozygous, homozygous, conditional genotypes etc.). All available information on phenotypes was merged at the ortholog gene level to allow comparisons with humans. The entire range of phenotypic expression of each knocked out ortholog was catalogued (
Thirty of the 31 anatomical systems of the MPO were affected in at least 1 knocked out ortholog. The most commonly affected anatomical systems were the immune system, the hematopoetic system, and homeostasis/metabolism, which were involved in more than 40% of the examined knocked out orthologs (
Fifty three of the 150 orthologs (35%) with knockout models displayed a lethal phenotype: 34 orthologs were associated with prenatal/perinatal lethality, 11 orthologs with postnatal lethality and 8 orthologs with both types of lethality (
Prenatal/perinatal lethality | Postnatal lethality | Both pre- and postnatal lethality |
PTCH1, STAT3, APOB, ANGPTL3, HIST1H1D, BCL11A, BMP4, JAK2, ALPL, GATA2, HBB, HHEX, LPL, SH2B1, CYP17A1, HNF1A, HNF4A, ATG16L1, ATP2B1, CDK6, CXCL12, KCNJ2, KIF1B, MAFB, SLC2A9, TNIP1, BRSK1, GNA12, HMGCR, NKX2-1, SOX17, TCF7L2, HNF1B, LMTK2 | HFE, TNFRSF11B, LDLR, GLIS3, INS, TNFAIP3, FTO, NKX2-3, FOXE1, IKZF2, LEF1 | FGFR2, ABCA1, BDNF, ERBB3, GCK, PTGER4, SMAD7, MAF |
We subsequently compared the phenotypic expression of the remaining 97 orthologs (i.e. those with knockout models that did not display lethality) and the corresponding phenotypes associated with these orthologs in the human GWAS. We restricted these comparisons to the affected anatomical system level, thus considering as agreement whenever an ortholog affected the same of the 31 anatomical systems (e.g. the ESR1 gene affected skeleton phenotypes in both species and was thus considered as an ortholog with concordant phenotypic information). For 69 orthologs (71%), the same anatomical system was affected in humans and mice, and for 13 of these 69 orthologs, mice and humans were concordant in two anatomical systems (
In a sensitivity analysis, we considered only those orthologs for which no prior association had been proposed with the phenotype of interest by candidate-gene studies. There were 62 orthologs available (
Our project represents a systematic comparison of GWAS-derived associations in humans and corresponding information from mouse models, based on curated and publicly available data. We used comprehensive databases from human and mouse research fields and we performed cross-species comparisons with two distinct approaches
This project builds on a conceptual framework of gene-disease comparisons between different species, as developed by previous studies. Zhang et al.
By meticulously reviewing the content of the MPO
The extent of the concordant orthologs was much larger than the expected by chance, although this difference was much attenuated and lost nominal significance when focusing strictly on novel GWAS findings. This suggests that genes that have been identified to be associated with various diseases and phenotypes in the candidate gene era have been extensively and purposefully investigated in mouse models. It is also possible that the mouse models have been searched more stringently to identify relevant phenotypes proposed by candidate genes. Alternatively, for agnostically discovered genes from GWAS, there is less concordance with mouse models to-date. Nevertheless, concordance at the gene-level may underestimate true biological similarity between species. Although the specific sets of genes associated with the same phenotype in humans and mice may be different, these genes may operate within networks that determine the same biological function. Such similarities can potentially be demonstrated by future analyses that use molecular pathway ontology systems for the genes of interest, e.g. Gene Ontology. Thus, the orthologs found in our analyses to be associated with the examined phenotypes in mice only can further inform secondary analyses (either gene-focused or pathway-based) of existing datasets in humans
In the gene-centric approach, we found that for the majority of the GWAS-derived genes, there are already available knockout models with deposited phenotypic information in MGI. We catalogued this phenotypic information and found extensive concordance between humans and mice, showing that certain orthologs can affect the same anatomical systems, and potentially the same biological function in both species. The concordance was still present when we excluded candidate-era genes.
This significant concordance is striking in view of the vast differences in the underlying genetic variants compared between humans and mice. In the GWAS, most of the associations for the common variants are likely due to variations altering gene function in relatively subtle ways. In contrast, knockout mouse models involve complete ablation of gene function, abolishing any activity of the corresponding protein. Furthermore, certain gene deletions were lethal in mice and thus were excluded from analyses. Despite these factors, we observed that the same anatomical systems were commonly affected in the two species; thus, our estimates of concordance may under-represent the true biological similarity that underlies the genetic associations in humans and mice.
The common variants studied in GWAS genotyping chips may tag rare variants that constitute the molecular basis of the observed associations
Although phenotypes in many mouse models may not be agnostically or comprehensively ascertained
Translation of GWAS discoveries into clinically meaningful diagnostic and therapeutic modalities will require an understanding of the underlying biology
Phenotypes investigated in human genome-wide association studies (GWAS). Column A: initial list of eligible phenotypic entries as provided in the NHGRI catalog of GWAS; Column B: final list of 69 non-overlaping phenotypes that were obtained after merging similar phenotypes. Merged phenotypic entries are highlighted in gray color.
(0.13 MB DOC)
The 31 Anatomical Systems according to the Mammalian Phenotype Ontology.
(0.05 MB DOC)
Mapping of human phenotypes to the Mammalian Phenotype Ontology.
(0.09 MB DOC)
Comparisons of the sets of ortholog genes associated with the same phenotype in humans and mice.
(0.12 MB DOC)
All orthologs associated with human and mammalian corresponding phenotypes.
(0.18 MB DOC)
GWAS-derived associations distinguished into novel ones and associations proposed by candidate gene studies.
(0.46 MB DOC)
Comparisons of the sets of orthologs associated with the same phenotype in humans and mice (considering only the novel GWAS associations).
(0.14 MB DOC)
Detailed phenotypic expression of all knocked out orthologs.
(4.11 MB DOC)
Orthologs that displayed lethality in knocked out models.
(0.13 MB DOC)
Comparisons of phenotypic expression between human GWAS genes and ortholog knocked out genes in mice (after excluding knockout genes associated with lethality).
(0.22 MB DOC)
Comparisons of phenotypic expression between human GWAS genes and ortholog knocked out genes in mice (after excluding associations that had already been proposed in the candidate-gene era).
(0.18 MB DOC)
Anatomical Systems affected in the knockout models for the ortholog genes.
(0.39 MB TIF)