Characterization of population genetic variation and structure can be used as tools for research in human genetics and population isolates are of great interest. The aim of the present study was to characterize the genetic structure of Xavante Indians and compare it with other populations. The Xavante, an indigenous population living in Brazilian Central Plateau, is one of the largest native groups in Brazil. A subset of 53 unrelated subjects was selected from the initial sample of 300 Xavante Indians. Using 86,197 markers, Xavante were compared with all populations of HapMap Phase III and HGDP-CEPH projects and with a Southeast Brazilian population sample to establish its population structure. Principal Components Analysis showed that the Xavante Indians are concentrated in the Amerindian axis near other populations of known Amerindian ancestry such as Karitiana, Pima, Surui and Maya and a low degree of genetic admixture was observed. This is consistent with the historical records of bottlenecks experience and cultural isolation. By calculating pair-wise Fst statistics we characterized the genetic differentiation between Xavante Indians and representative populations of the HapMap and from HGDP-CEPH project. We found that the genetic differentiation between Xavante Indians and populations of Ameridian, Asian, European, and African ancestry increased progressively. Our results indicate that the Xavante is a population that remained genetically isolated over the past decades and can offer advantages for genome-wide mapping studies of inherited disorders.
Citation: Kuhn PC, Horimoto ARVR, Sanches JM, Vieira Filho JPB, Franco L, et al. (2012) Genome-Wide Analysis in Brazilian Xavante Indians Reveals Low Degree of Admixture. PLoS ONE 7(8): e42702. doi:10.1371/journal.pone.0042702
Editor: Dennis O’Rourke, University of Utah, United States of America
Received: April 13, 2012; Accepted: July 10, 2012; Published: August 10, 2012
Copyright: © Kuhn et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This study was funded by grants from Fundação de Amparo à Pesquisa do Estado de São Paulo (Fapesp), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and INCT- Obesidade e Diabetes. The funders have no rule in study design, data coletion and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Knowledge of genetic diversity patterns of human populations provides important insights of their evolutionary history and is useful in genetic mapping studies of complex diseases and their component traits –. In this context, isolated populations are of particular interest since they may overcome some of the challenges in genetic investigations. Here we report the first genome-wide SNP-based study of the genetic structure in Xavante Indians.
Xavante is an indigenous population living in Mato Grosso state, Central Brazil. They comprise approximately 10,000 individuals, one of the largest indigenous groups in Brazil, and are Jê-speaking people , . The earliest contact of Xavante with western culture was during the 18th century in the Brazilian Central Plateau and this period was marked by epidemics, armed conflicts and forced labor imposed by the Portuguese colonial government. In the middle of 19th century, in an attempt to escape mistreatment, they moved westward to their present habitat and remained relatively isolated until the early 20th century. In the 1940s Brazilian government decided to stimulate settlements in its central region, considered a sparsely populated area, aiming to promote greater integration of this area with the rest of the country. As a result of this political decision the Xavante groups were forced to deal with the settlers and the post-contact period was characterized by epidemics and conflicts that resulted in a severe reduction of their population. By the end of the 1950s, the Xavante were reduced to small patches of settlements. However, in the last decades with the demarcation of their lands, health programs and establishment of peaceful contacts with non-Indians they experienced an increase in their population. Importantly, despite of the interaction with the outsiders, the Xavante maintain their own complex social organization and cultural values that were preserved over the years –.
The aim of the present study was to characterize the genetic structure in Xavante Indians providing valuable baseline data for future genetic studies and to compare it with a Southeast Brazilian population , populations of the Human Genome Diversity Panel (HGDP-CEPH) , and populations of the HapMap Project, Phase III , which include individuals of Asian, African, European, and Mexican Ancestry.
Material and Methods
A cross-sectional study was conducted in the Sangradouro Reservation, Mato Grosso state, Brazil. The initial sample of Xavante Indians was comprised of 300 individuals and blood samples were collected from each subject. Genomic DNA was extracted from peripheral blood leukocytes using a commercial kit (Puregene DNA Isolation Kit, Gentra System, USA).
Individuals were genotyped in 731,442 SNPs using Human Omni Express Bead Chip plataform (Illumina, San Diego, CA, USA). Genome-wide pair-wise identity-by-descent (IBD) estimated using the PLINK package  (http://pngu.mgh.harvard.edu/~purcell/plink/) confirmed the presence of related individuals in this sample. Genetic data were used for obtaining maximum likelihood estimates of relationship among pairs of individuals using the ML-Relate software  (http://www.montana.edu/kalinowski/Software/MLRelate.htm). This approach uses simulation to determine which relationships are consistent with the empirical genotype data, comparing putative relationships with possible alternatives. Only main pairs of individuals, such as parent-offspring, full-siblings, and half-siblings, are identified by the software. The ML-Relate software was unable to estimate the relationships using the complete panel of SNPs (731,442 SNPs), therefore we selected a set of approximately 10% independent markers (730 SNPs) to perform this analysis. The selection of unrelated individuals was conducted in several steps. The individual with the greatest number of relationships was identified and removed from the Xavante sample. Next, the relatedness was recalculated, generating other relationship structures. Again, the individual with the higher number of relatedness was removed and the relatedness was recalculated. This process was repeatedly made until we had only an unrelated sample. We identified 53 unrelated Xavante Indians in our sample.
Genotype data from the HapMap project (Phase 3) , the Human Genome Diversity Panel (HGDP-CEPH) , and from a Brazilian population sample were used to study the genetic structure of Xavante Indians. The HapMap Phase III data set is composed of 1,301 individuals of 11 populations that had been genotyped in almost 1.6 million genetic markers. The HGDP-CEPH database is composed of 1,068 individuals of 55 isolated populations. The Brazilian sample is constituted by 172 non-related individuals of a high degree of admixture , selected from residents in the municipality of São Paulo, the largest metropolitan area of the country. Genotyping for the Brazilian samples was performed using the Affymetrix SNP array 6.0 (Affymetrix, Santa Clara, CA, USA). All 11 panels of HapMap data set and 55 isolated populations of HGDP database were considered in our study. The PLINK package  was used for data management and quality control procedures on markers. Representative founder individuals were selected presenting a minimum percentage of 95% of high quality genotyped markers. Genetic markers were filtered using a maximum per-marker missing of 0.01 and a minor allele frequency (MAF) greater than 1%. The final dataset was composed of 1,198 HapMap, 940 HGDP, 172 São Paulo, and 53 Xavante individuals and 86,197 markers. Table 1 shows the populations and their respective number of individuals included in our study.
Table 1. Populations and number of individuals (N) included in the study.doi:10.1371/journal.pone.0042702.t001
The Indian leaders and the study participants were informed about the purposes of this study and gave their consent. The majority of the population gave their written consent, for the ones who were illiterate (14%), fingerprint impressions were used to document their approval. A Xavante health agent worked as an interpreter when necessary. This study was approved by Ethics Committee of Escola Paulista de Medicina, Universidade Federal de São Paulo and Brazilian National Ethics Commision (CONEP).
Principal component analysis (PCA) was applied to genotype data to infer continuous axes of genetic variation using the SmartPCA program of the Eigensoft package . The axes of variation are defined as the top eigenvectors of a covariance matrix among samples and they are able to reduce the data to a small number of dimensions. PCA analysis were initially carried out using different subsets of populations to evaluate which populations were relevant to estimate the population structure of Xavante Indians. These datasets were composed by Xavante sample and, respectively: (a) other Amerindians samples; (b) Amerindians and Asians samples; (c) Amerindians, Asians and São Paulo samples; (d) the complete dataset. In these subsets, Han Chinese from Beijing, China (CHB), Chinese in Metropolitan Denver, Colorado (CHD), Gujarati Indians in Houston, Texas (GHI), and Japanese in Tokyo, Japan (JPT) (HapMap populations) and Cambodian represent the Asian ancestry; and Karitiana, Pima, Surui, and Maya (HGDP-CEPH) were used as representatives of the Amerindian ancestry. Mexican ancestry in Los Angeles, California (MEX HapMap populations) was also included in all analyzed subsets. MEX and São Paulo samples represent admixture populations. The three first principal components were plotted against each other and the different plots were analyzed. The São Paulo and Gujarati Indians in Houston, Texas, USA (GHI population of the HapMap project) samples have not contributed effectively to reveal the genetic population structure of Xavante samples and were removed from final dataset. Thus the final dataset for PCA was composed by Xavante Indians and all populations of HAPMAP and HGDP projects, excluding GHI and São Paulo samples. A PCA was then performed using this final dataset.
Pair-wise Fst estimates were also computed by SmartPCA program to characterize the genetic differentiation between Xavante Indians and the populations of the final dataset.
Next, we investigate the genome-wide genetic distance among individuals of all populations using a distance matrix computed by the PLINK package . This matrix was obtained from complementary values of pair-wise identity-by-state. A neighbour-joining tree was constructed using PHYLIP (http://evolution.genetics.washington.edu/phylip.html) and visualized with HyperTree , a Java phylogenetic tree viewer (http://kinase.com/tools/HyperTree.html).
We knew there was a familial structure in the Xavante Indians sample. Genealogical relationships can be represented mathematically as probabilities that individuals share zero, one, or two alleles identical-by-descent. All 731.442 markers were used to estimate the genome-wide pair-wise identity-by-descent using the PLINK package, confirming the presence of related individuals in our sample, as shown in Figure 1A (initial Xavante sample). In an unrelated sample, the pairs of individuals must be concentrated in the right-down portion of the graphic, showing a high probability of sharing zero alleles against a low probability of sharing one allele identical-by-descent. In our initial sample, the pairs are scattered in the graphic. The circle in the (0,0) position represents a duplicate sample, since there was no monozygotic twins in this population, which was used for QC measures.
Figure 1. Probabilities that pairs of individuals, represented by circles, share zero (IBD = 0) vs. one (IBD = 1) allele identical-by-descent.
Panel A represents the sharing of alleles among the initial sample of Xavantes, and panel B represents the sharing of alleles among unrelated individuals only.doi:10.1371/journal.pone.0042702.g001
The ML-Relate software was used to select unrelated individuals. Estimates of relatedness and relationship of main related pairs (parent-offspring, full-siblings, and half-siblings) are computed by maximum likelihood approach. After the removing process, a subset of 53 unrelated Xavante Indians were selected for the genetic population structure analysis. In Figure 1B, we show the sharing probability of alleles among pairs of individuals in this subset. ML-Relate has identified the main related pairs only, then the individuals more distantly related, as second- or third-degree related, remain in the sample. Indeed, the maximum probability of share one allele (IBD = 1) founded among unrelated individuals is 0.29, slightly greater than that observed between uncle/aunt and nephew/niece or among first cousins (0.25), confirming the presence of second-degree or more distant relationships in our final Xavante sample.
The group of 53 Xavante Indians was merged with HapMap, HGDP-CEPH, and São Paulo databases and the set of markers genotyped in all datasets was determined. The merged dataset is composed of 2,363 individuals (1,198 HapMap, 940 HGDP, 172 São Paulo, and 53 Xavante individuals) genotyped in 86,197 markers.
Ten continuous axes of genetic variation (eigenvectors) were computed using the Eigensoft package. Principal component analyses (PCA) using different subsets of populations were firstly performed to investigate which populations were important for determine the genetic structure of the Xavante Indians. These datasets were composed by Xavante Indians and, respectively: (a) only Amerindians samples; (b) Amerindians and Asians samples; (c) Amerindians, Asians and São Paulo samples; (d) the complete dataset. Three-dimensional plots were provided to all analyses, considering the first three principal components. Comparing plots of each subset of populations, we decided to remove São Paulo and GHI samples from the final dataset for PCA, since these two populations do not enhance the determination of the eigenvectors of the Xavante sample. Thus the final dataset for PCA was composed by Xavante Indians and all populations of HAPMAP and HGDP projects, excluding only São Paulo and GHI samples. A principal component analysis was then performed using this final dataset. Although the São Paulo sample has been removed from our final dataset, we plotted the first three eigenvectors including it to illustrate the ancestry of the general Brazilian population, as shown in Figure 2. Xavante Indians (purple points), as expected, are concentrated in the Amerindian axis near the other populations of known Amerindian ancestry (Karitiana, Pima, Surui, and Maya). The São Paulo sample (black dots) is predominantly located between the European and African axis, showing a high degree of genetic admixture. This finding confirms early results , and it corroborates the long history of intermarriage between Europeans and Africans descent in the Brazilian population.
Figure 2. Three-dimensional plot of first three principal components (PC1, PC2, and PC3) computed from the merged dataset of populations.
Yoruba, Maasai in Kinyawa, Kenia (MKK), African ancestry in Southwest, USA (ASW), and BantuKenya populations represent Africans; Utah residents with Northern and Western European ancestry from the CEPH collection (CEU), Adygei, and Basque populations represent Europeans; and Han Chinese in Beijing, China (CHB), Chinese in Metropolitan Denver, Colorado (CHD), Japanese in Tokyo, Japan (JPT), and Cambodian represent Asians.doi:10.1371/journal.pone.0042702.g002
We also studied the genetic similarity at the individual level from a genetic distance matrix obtained by calculating the complementary values of the genome-wide average proportion of alleles identical-by-state shared among pairs of individuals from all studied samples. The results of a neighbour-joining tree analysis are shown in Figure 3. The individuals were color labeled according to the geographical distribution of their populations. The Xavante Indians (in purple) clustered among the other Native American populations (in pink), as expected, corroborating the results obtained by the principal component analysis.
Figure 3. Neighbour-joining tree for the final dataset.
The individuals were color labeled according to geographical distribution of their populations. ASW: African ancestry in Southwest USA; CEU: Utah residents with Northern and Western European ancestry from the CEPH collection; CHB: Han Chinese in Beijing, China; CHD: Chinese in Metropolitan Denver, Colorado; JPT: Japanese in Tokyo, Japan; LWK: Luhya in Webuye, Kenya; MEX: Mexican ancestry in Los Angeles, California; MKK: Maasai in Kinyawa, Kenya; TSI: Toscani in Italy; YRI: Yoruba in Ibadan, Nigeria.doi:10.1371/journal.pone.0042702.g003
Pair-wise Fst estimates were also computed by the SmartPCA program of the Eigensoft package for all populations of our final dataset with more than 6 sampled individuals. The genetic differentiation between Xavante Indians and representative populations of the European, Asian, African, and Amerindian ancestry are shown in Table 2. We selected the HapMap populations CEU (Utah residents with Northern and Western European ancestry from the CEPH collection) to represent European ancestry, CHB, CHD, and JPT to represent Asian ancestry, and YRI (Yoruba in Ibadan, Nigeria) to characterize the African ancestry. Colombian, and Maya from HGDP-CEPH project characterized the Amerindian ancestry. Again, these results confirm some expected differences and resemblances among the set of studied populations. The genetic differentiation between Xavante Indians and populations of Ameridian, Asian, European, and African ancestry increased progressively.
In this study we compared the genetic variation of 2,363 individuals (1,198 HapMap, 940 HGDP, 172 São Paulo, and 53 Xavante individuals) genotyped in 86,197 markers. Our results showed that the Xavante population has a low level of genetic admixture. This is consistent with the historical records of bottlenecks experience, cultural isolation with subsequent reduced gene flow. Despite using different methodologies, which difficult the comparison, our data are consistent with previous studies that have shown low level of admixture in the Xavante population suggesting that no significant changes in this villagés gene pool has occurred over last decades , –.
Genetic studies of isolated populations have been subject of interest since they may help to map genes underlying simple monogenic, as well as, complex diseases. In isolated populations, monogenic disorders are less likely to show non-allelic heterogeneity . The use of these populations in mapping complex disease have some advantages such as low genetic diversity, high degree of LD, restricted allelic and locus heterogeneity, reduced haplotype complexity and greater potential for identification of rare variants –. These benefits in association with cultural and environmental homogeneity make this population a good opportunity to identify novel susceptibility alleles for complex disease.
Principal component analyses demonstrate that the Xavante population is a distinct ethnic group more closely related to individuals of Amerindian ancestry and genetically distinct from other HapMap and HGDP populations and from São Paulo individuals.
By calculating pairwise Fst statistics, we found that the genetic differentiation between the Xavante population and representative populations of Amerindian, Asian, European and African ancestry increased progressively. These results are consistent with the Americas history of peopling that suggests a main colonization event from Siberia. The migration of humans from Eurasia to the Americas took place via Bering Strait and spread throughout North, Central and South Americas, diversifying into several culturally distinct native populations –.
The findings from this study add to our understanding of genomic variation across the South American native populations and confirm that the Xavante is a closed Indian population that can provide a unique opportunity for genome-wide mapping studies of inherited disorders.
Conceived and designed the experiments: PCK ARVRH ACP RSM. Performed the experiments: PCK. Analyzed the data: PCK ARVRH JMS ACP RSM. Contributed reagents/materials/analysis tools: ARVRH JMS LJF ACP. Wrote the paper: PCK ARVRH ACP RSM. Collected the data: PCK JPBVF LF LJF ADF RSM.
- 1. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, et al. (2002) Genetic structure of human populations. Science 298: 2381–2384. doi: 10.1126/science.1078311
- 2. Tishkoff SA, Verrelli BC (2003) Patterns of human genetic diversity: implications for human evolutionary history and disease. Annu Rev Genomics Hum Genet 4: 293–340. doi: 10.1146/annurev.genom.4.070802.110226
- 3. Cavalli-Sforza LL, Feldman MW (2003) The application of molecular genetic approaches to the study of human evolution. Nat Genet 33: Suppl S266–275.
- 4. Pereira NOM, Santos RV, Welch JR, Souza LG, Coimbra CEA Jr (2009) Demography, territory, and identity of indigenous peoples in Brazil: The Xavante indians and the 2000 Brazilian National Census. Human Organization 68: 166–180.
- 5. Coimbra CEA Jr, Flowers NM, Salzano FM, Santos RV (2002) The Xavant in transition: health, ecology and bioanthropology in Central Brazil. Ann Arbor: University of Michigan Press. 344 p.
- 6. Nell JV, Salzano FM, Junqueira PC, Maybury-Lewis D (1964) Studies on the Xavante Indians of the Brazilian Mato Grosso. Am. J. Hum. Genet 16: 52–140.
- 7. Garfield S (2001) Indigenous struggle at the heart of Brazil: state policy, frontier expansion, and the Xavante Indians. Durham: Duke University Press. 361 p.
- 8. Giolo SR, Soler JM, Greenway SC, Almeida MA, de Andrade M, et al. (2012) Brazilian urban population genetic structure reveals a high degree of admixture. Eur J Hum Genet 20: 111–116. doi: 10.1038/ejhg.2011.144
- 9. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, et al. (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319: 1100–1104. doi: 10.1126/science.1153717
- 10. Altshuler DM, Gibbs RA, Peltonen L (2010) Dermitzakis E, Schaffner SF, et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58. doi: 10.1038/nature09298
- 11. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. doi: 10.1086/519795
- 12. Kalinowski ST, Wagner AP, Taper ML (2006) ML-Relate: a computer program for maximum likelihood estimation of relatedness and relationship. Molecular Ecology Notes 6: 576–579. doi: 10.1111/j.1471-8286.2006.01256.x
- 13. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909. doi: 10.1038/ng1847
- 14. Bingham J, Sudarsanam S (2000) Visualizing large hierarchical clusters in hyperbolic space. Bioinformatics 16: 660–661. doi: 10.1093/bioinformatics/16.7.660
- 15. Ward RH, Salzano FM, Bonatto SL, Hutz MH, Coimbra CEA, et al. (1996) Mitochondrial DNA polymorphism in three Brazilian Indian tribes. Am J Hum Biol 8: 317–323. doi: 10.1002/(sici)1520-6300(1996)8:3<317::aid-ajhb2>3.3.co;2-f
- 16. Salzano FM, Franco MHLP, Weimer TA, Callegari-Jacques SM, Mestriner MA, et al. (1997) The Brazilian Xavante Indians revisited: new protein genetic studies. Am J Phys Anthropol 104: 23–34. doi: 10.1002/(sici)1096-8644(199709)104:1<23::aid-ajpa2>3.0.co;2-e
- 17. Friedrich DC, Callegari-Jacques SM, Petzl-Erler M, Tsuneto L, Salzano FM, et al. (2012) Stability or variation? Patterns of lactase gene and its enhancer region distributions in Brazilian Amerindians. Am J Phys Anthropol 147: 427–432. doi: 10.1002/ajpa.22010
- 18. Shefferd VC, Stone EM, Carmi R (1998) Use of isolated inbred human populations for identification of disease genes. Trends Genet 14: 391–396. doi: 10.1016/s0168-9525(98)01556-x
- 19. Kristiansson K, Naukkarinen J, Peltonen L (2008) Isolated populations and complex disease gene identification. Genome Biology 9: 109.1–109.9. doi: 10.1186/gb-2008-9-8-109
- 20. Peltonen L, Palotie A, Lange K (2000) Use of population isolates for mapping complex traits. Nature Genet 1: 182–190. doi: 10.1038/35042049
- 21. Jordes LB, Watkins WS, Kere J, Nyman D, Eriksson AW, et al. (2000) Gene mapping in isolated populations: new roles for old friends? Hum Hered 50: 57–65. doi: 10.1159/000022891
- 22. Wang S, Lewis CM Jr, Jakobsson M, Ramachandran S, Rayet N, et al. (2007) Genetic variation and population structure in native americans. PLoS Genetics 3: 2049–2067. doi: 10.1371/journal.pgen.0030185
- 23. Mulligan CJ, Hunley K, Cole S, Long JC (2004) Population genetics, history, and health patterns in native americans. Rev Genomics Hum Genet 5: 295–315. doi: 10.1146/annurev.genom.5.061903.175920
- 24. Zegura SL, Karafet TM, Zhivotovsky LA, Hammer MF (2004) High-resolution SNPs and microsatellite haplotypes point to a single, recent entry of native american Y chromosomes into the Americas. Mol Biol Evol 21(1): 164–175. doi: 10.1093/molbev/msh009
- 25. Kolman CJ, Nyamkhishig S, Bermingham E (1996) Mitochondrial DNA analysis of Mongolian populations and implications for the origin of New World founders. Genetics 142: 1321–1334.
- 26. Santos FR, Pandya A, Tyler-Smith C (1999) The Central Siberian origin for native American chromosomes. Am J Hum Genet 64: 619–628. doi: 10.1086/302242
- 27. Tamm E, Kivisild T, Reidla M, Metspalu M, Smith DG, et al. (2007) Standstill and spread of native american founders. PLoS One 9: 1–6. doi: 10.1371/journal.pone.0000829