Conceived and designed the experiments: LF SA RFC CD AG. Performed the experiments: KB. Analyzed the data: LF SA SH JD MLP. Contributed reagents/materials/analysis tools: NA MB MDCRP VF JR RFC EJP-S JD EZ GT-M JLM. Wrote the paper: LF SA MV CRG MCS EZ EGB EJP-S.
The authors have declared that no competing interests exist.
The population of Argentina is the result of the intermixing between several groups, including Indigenous American, European and African populations. Despite the commonly held idea that the population of Argentina is of mostly European origin, multiple studies have shown that this process of admixture had an impact in the entire Argentine population. In the present study we characterized the distribution of Indigenous American, European and African ancestry among individuals from different regions of Argentina and evaluated the level of discrepancy between self-reported grandparental origin and genetic ancestry estimates. A set of 99 autosomal ancestry informative markers (AIMs) was genotyped in a sample of 441 Argentine individuals to estimate genetic ancestry. We used non-parametric tests to evaluate statistical significance. The average ancestry for the Argentine sample overall was 65% European (95%CI: 63–68%), 31% Indigenous American (28–33%) and 4% African (3–4%). We observed statistically significant differences in European ancestry across Argentine regions [Buenos Aires province (BA) 76%, 95%CI: 73–79%; Northeast (NEA) 54%, 95%CI: 49–58%; Northwest (NWA) 33%, 95%CI: 21–41%; South 54%, 95%CI: 49–59%; p<0.0001] as well as between the capital and immediate suburbs of Buenos Aires city compared to more distant suburbs [80% (95%CI: 75–86%) versus 68% (95%CI: 58–77%), p = 0.01]. European ancestry among individuals that declared all grandparents born in Europe was 91% (95%CI: 88–94%) compared to 54% (95%CI: 51–57%) among those with no European grandparents (p<0.001). Our results demonstrate the range of variation in genetic ancestry among Argentine individuals from different regions in the country, highlighting the importance of taking this variation into account in genetic association and admixture mapping studies in this population.
The current population of Argentina is the result of generations of intermixing between various groups, including Indigenous Americans who originally resided in this part of South America, Spanish conquistadores and Africans brought as slaves starting in the early and late 1500s respectively, and a large European immigrant population that arrived between 1870 and 1950
In spite of this rich history of immigration and admixture, most of the Argentine population self-identifies as of European-descent, with only 1% of the total population self-identifying as descendants of an indigenous group (INDEC, 2006). In contrast to this perception, it has been reported that a considerable proportion of the Argentine population has at least one Indigenous American ancestor
In this study we investigated the distribution of genetic ancestry in four regions of Argentina in a relatively large number of individuals (n = 441), taking into account information on the origin of each individual's grandparents. The latter allowed us to evaluate the level of concordance between grandparental origin and genetic ancestry estimates. We also compared the distribution of individual ancestry in Buenos Aires city (N = 168), the largest urban area in Argentina, to those of two other large urban areas in Latin America: Mexico City (N = 502) and San Juan de Puerto Rico (N = 133) to contextualize the observed level of variation in individual ancestry proportions of Buenos Aires with those observed in other major Latin American cities.
All participants provided written informed consent. The study was approved by the Human Research Protection Program, Committee of Human Research of the University of California, San Francisco, the Ethics Committee of the Hospital Italiano of Buenos Aires and the Ponce School of Medicine & Health Sciences Institutional Review Board. The Argentine Ministry of Health approved the study and the shipment of samples from Argentina to UCSF for analysis.
Argentine men and women were randomly identified between the years 2000 and 2010 from blood donor banks within major hospitals in different regions of the country and invited to participate. All individuals were asked to donate a sample of peripheral blood, and were asked to provide information about the region/country of birth of all grandparents (
Each individual is represented by a vertical bar on the X-axis. Bars are divided into percent European (blue), Indigenous American (red) and African ancestry (green). BA = Buenos Aires province; NEA = Northeast; NWA = Northwest; South = South. Individuals on the X-axis are sorted based on increasing Indigenous American ancestry. On the lower right corner we include a map of Argentina indicating the location of the samples. For analysis we grouped samples by region: black: BA, pink: South, grey: NWA, orange: NEA. Samples of individuals from the NEA region (orange) were obtained from the hospitals in Buenos Aires.
Data on genetic ancestry from Mexico City was obtained from 502 healthy Mexican women enrolled in a breast cancer case-control study
A set of 106 single nucleotide polymorphisms (SNPs) that can discriminate Indigenous American, African, and European ancestry was used to estimate the proportion of genetic ancestry in individuals from Argentina, Mexico City and San Juan de Puerto Rico. Simulation studies have shown that 100 ancestry informative markers (AIMs) with allele frequency differences similar to the ones we used here are required to achieve a correlation higher than >0.9 with true ancestry
Six of the 106 AIMs were excluded from the analysis because they had a call rate lower than 90%. Even though AIMs are expected to violate Hardy-Weinberg equilibrium more than other markers, we excluded an additional SNP due to its deviation from expected frequencies under equilibrium (p<0.0005). The final analysis included a total of 99 AIMs. We genotyped 558 individuals from Argentina, 502 from Mexico and 141 from Puerto Rico. We found complete concordance among 10 genotyped duplicates. We excluded individuals with a genotype call rate of <70% (117 from Argentina, and 8 from Puerto Rico). The final analysis included 441 samples from Argentina, 502 from Mexico and 133 from Puerto Rico. The data used in the present study is available from the authors upon request.
As part of a different ongoing study, fifty-four out of the 441 individuals were also genotyped with an Affymetrix 250 K StyI array (∼238,000 SNPs). We excluded SNPs with more than 5% of missing data and a hardy-Weinberg equilibrium p<0.00005. Since the model in ADMIXTURE
Individual genetic ancestry was estimated using a maximum likelihood (ML) approach
Multidimensional Scaling with pairwise allele sharing distances as implemented in the program PLINK
The difference in mean European/Indigenous American ancestry between the different Argentine regions was tested using the two-sample Kolmogorov-Smirnov test for equality of distribution functions. The significance of the difference in mean Indigenous American/European ancestry between the five categories defined by the presence of 0, 1, 2, 3, or 4 grandparents that reside in a particular region of Argentina was evaluated with the Kruskal-Wallis test. Non-parametric approaches were selected because the distribution of genetic ancestry deviated from normality (Shapiro-Wilk test, p<0.05). Both analyses were conducted with the program STATA 11
The distribution of genetic ancestry among the 441 Argentine individuals included in our study varied from 0 to 100% Indigenous American, 0 to 100% European, and 0 to 35% African ancestry (
The blue boxes represent the European component, the red boxes the Indigenous American component and the green boxes the African component. BA = Buenos Aires province; NEA = Northeast; NWA = Northwest; South = South.
Individuals from Buenos Aires city were ascertained from two large hospitals, one private (n = 79) and one public (n = 89). Individuals from the private hospital had more European ancestry (80%; 95%CI: 76–85%) compared to individuals ascertained from the public hospital (76%; 95%CI: 72–80%), p = 0.028. To investigate the potential cause for this heterogeneity within the city of Buenos Aires, we determined the association between place of residence and genetic ancestry. Individuals recruited in these hospitals were residents of either the city of Buenos Aires proper, or the immediate surrounding urban areas (urban belts 1 and 2,
European | P |
Indigenous American | P |
African | P |
|
All individuals combined | ||||||
Buenos Aire City (n = 98) | 79 (76–83) | 0.006 | 17 (13–20) | 0.006 | 4 (3–5) | 0.431 |
1st uban belt (n = 47) | 80 (75–86) | 16 (11–22) | 3 (2–5) | |||
2nd urban belt (n = 22) | 68 (58–77) | 29 (20–38) | 3 (1–6) | |||
Italian hospital (private) | ||||||
Buenos Aires City (n = 48) | 84 (79–89) | 0.729 | 11 (7–16) | 0.602 | 5 (3–6) | 0.784 |
1st uban belt (n = 22) | 75 (64–86) | 21 (10–31) | 4 (2–6) | |||
2nd urban belt (n = 7) | 73 (52–94) | 20 (3–38) | 6 (1–12) | |||
Clinicas' hospital (public) | ||||||
Buenos Aires City (n = 45) | 77 (71–82) | 0.015 | 20 (15–25) | 0.015 | 3 (2–4) | 0.303 |
1st urban belt (n = 25) | 85 (79–91) | 12 (6–18) | 2 (1–4) | |||
2nd urban belt (n = 15) | 65 (53–77) | 33 (22–44) | 2 (0–4) |
p value for the two-sample Kolmogorov-Smirnov test comparing the 2nd urban belt to a group that includes the Capital and the 1st urban belt.
We collected information about the region/country of birth of each individual's grandparents and we compared the mean estimated proportion of European, African and Indigenous American genetic ancestry between individuals who had 0 to 4 grandparents having been born in a particular region of Argentina, in any other Latin American country or in Europe (
Region | Ancestry | 0 | N | 1 | N | 2 | N | 3 | N | 4 | N | p |
Europe | African | 4 (5) | 271 | 4 (4) | 59 | 3 (4) | 49 | 3 (4) | 22 | 3 (3) | 40 | <0.001 |
European | 54 (25) | 80 (15) | 79 (20) | 86 (13) | 91 (9) | |||||||
Indigenous | 42 (25) | 16 (15) | 18 (20) | 11 (13) | 5 (8) | |||||||
AMBA | African | 4 (5) | 356 | 3 (5) | 23 | 3 (4) | 27 | 3 (4) | 24 | 5 (5) | 11 | <0.001 |
European | 61 (26) | 78 (19) | 87 (13) | 89 (10) | 83 (13) | |||||||
Indigenous | 35 (26) | 19 (18) | 10 (13) | 8 (11) | 12 (12) | |||||||
Center | African | 3 (4) | 324 | 3 (3) | 33 | 6 (8) | 40 | 3 (5) | 21 | 3 (4) | 23 | <0.001 |
European | 61 (28) | 80 (16) | 75 (18) | 80 (15) | 76 (16) | |||||||
Indigenous | 35 (28) | 17 (16) | 19 (16) | 17 (16) | 21 (15) | |||||||
NWE | African | 4 (5) | 372 | 5 (7) | 8 | 3 (4) | 22 | 5 (5) | 15 | 4 (5) | 24 | <0.001 |
European | 69 (24) | 68 (18) | 51 (25) | 26 (20) | 35 (17) | |||||||
Indigenous | 27 (24) | 27 (15) | 46 (25) | 69 (22) | 61 (18) | |||||||
NEA | African | 3 (4) | 396 | 3 (3) | 7 | 8 (8) | 12 | 4 (5) | 12 | 6 (4) | 14 | 0.003 |
European | 66 (27) | 81 (12) | 55 (18) | 59 (10) | 49 (14) | |||||||
Indigenous | 30 (27) | 16 (13) | 37 (13) | 37 (9) | 44 (13) | |||||||
South | African | 4 (5) | 386 | 4 (6) | 6 | 3 (3) | 21 | 2 (2) | 6 | 1 (2) | 22 | <0.001 |
European | 68 (25) | 64 (12) | 59 (23) | 44 (29) | 34 (26) | |||||||
Indigenous | 28 (25) | 32 (15) | 38 (23) | 54 (29) | 65 (27) | |||||||
Center-west | African | 4 (5) | 416 | 5 (7) | 7 | 3 (3) | 10 | 0 (0) | 1 | 3 (4) | 7 | 0.651 |
European | 65 (27) | 70 (22) | 71 (20) | 25 (0) | 66 (18) | |||||||
Indigenous | 31 (26) | 25 (19) | 26 (20) | 75 (0) | 31 (17) | |||||||
South America | African | 4 (5) | 336 | 3 (4) | 20 | 5 (5) | 31 | 3 (4) | 8 | 3 (4) | 46 | <0.001 |
European | 69 (26) | 63 (29) | 51 (24) | 63 (16) | 47 (18) | |||||||
Indigenous | 27 (26) | 34 (29) | 44 (24) | 34 (18) | 50 (19) |
AMBA = Buenos Aires Metropolitan Area.
NWE = Northwest.
NEA = Northeast.
South America = Origin from other South American Countries.
P value for Kruskal-Wallis equality of populations rank test evaluating the significance of the difference in mean Indigenous American/European ancestry between the 0 to 4 origin of grandparent categories for each region.
In an effort to validate the ancestry estimates obtained with our set of AIMs, we next compared individual genetic ancestry estimates obtained with the set of 99 AIMs to those obtained with a set of 118,192 SNPs in a group of 54 individuals within our Buenos Aires sample. The correlation coefficients for the European and Indigenous American estimates were 0.90 and 0.93 respectively (
We projected the 168 samples from Buenos Aires City, 502 samples from Mexico City, 133 Puerto Rican samples from San Juan and 109 ancestral individuals (Indigenous Americans, Africans and Europeans) within the same space defined by the 1st and 2nd dimensions of a multidimensional scaling analysis (
We investigated the individual genetic ancestry proportions among individuals from four regions in Argentina and demonstrated their variation across and within regions, with the NWA region having the most striking difference in European and Indigenous American ancestry proportions compared to all other regions. Moreover, we found that within Buenos Aires City there were modest but statistically significant differences in genetic ancestry across different urban regions. In this respect, our results add to the previously published descriptions of genetic ancestry distribution in Argentina by providing more extensive sampling and genetic ancestry estimates that are based on a relatively large set of AIMs.
The genetic diversity of various Latin American populations has been previously described
There have been previous studies that focused on the genetic admixture characteristics of the Argentine population. Our group has previously investigated the proportion of African ancestry in individuals from Buenos Aires City
A more recent study by Corach
In our study, we found the most significant differences in genetic ancestry proportions among individuals from NWA when compared to individuals from all other regions investigated. Currently, the Argentine northwest has approximately 5,000,000 inhabitants, who represent about 1/8 of the total country population. When the Spanish conquistadores arrived in the early 1500s, this region had the largest population size of the Argentine territory, with an estimated population of 200,000
In the present study we were able to compare the distribution of genetic ancestry of the two northern regions of the country (NWA and NEA). We observed that individuals from NEA had a greater proportion of European ancestry compared to individuals from NWA. However, we also observed a smaller degree of variation of the Indigenous American and European components in NEA compared to NWA (NEA Standard Deviation (SD) = 0.13 and NWA SD = 0.24; variance ratio test p = 0.0014). This can be interpreted as the result of more widespread admixture in NEA. The historical data seems to support this assertion. Specifically, although the Spanish authorities tried to restrict inter-ethnic marriages and the use of indigenous languages across all regions of their colonies, the NEA represented a marginal area where this control was not very effective
The Buenos Aires Metropolitan Area (Buenos Aires City and surrounding urban areas) is the third most populated metropolis in Latin America, after Mexico City and Sao Paulo. In Buenos Aires City two historical events had a strong influence in the genetic composition of its inhabitants. First, the arrival of a large number of European immigrants, mostly from Italy and Spain, between 1870 and 1950 who intermixed with the smaller local population Buenos Aires City. This was a population that had already resulted from admixture of several generations of original Indigenous Americans, Africans brought as slaves during the Spanish conquest in the 16th and 17th century, and the earlier Spanish conquistadores
We validated our ancestry estimates with two approaches. One, by comparing ancestry estimates obtained with our panel of 99 AIMs to those obtained with 118,192 SNPs in a subset of individuals. Overall, we observed that the individual ancestry estimates that we obtained using information from 99 AIMs were strongly correlated with those obtained with genome wide data for the major ancestral components (European and Indigenous American). However, this was not the case for African ancestry, which shows a correlation coefficient of about 0.12. This low level of correlation is likely the result of an overestimation of the African component, as estimated by the 99 AIMs panel. Since genetic ancestry estimates have statistical variance, when the proportion of ancestry is close to zero the estimates of ancestry tend to be biased towards higher numbers (since the model does not allow for <0 ancestry). Therefore, care should be taken in interpreting ancestry estimates when the overall proportion of that ancestral group is low (<5%).
Another way of validating our results was to compare the obtained genealogical information to the estimated proportions of genetic ancestry, and to investigate how informative one would be of the other. We observed that the average estimated European ancestry among individuals with at least one European grandparent was higher than that of individuals with no European born grandparents (80% versus 54%). Therefore, our data showed that the number of grandparents born in Europe is highly correlated with the proportion of European ancestry as measured by genetic markers. Interestingly, our data indicated that the largest change in average genetic ancestry when considering the number of grandparents born in Europe was between 0 and 1 grandparent. This is probably reflecting the effect of assortative mating
Our study contrasted the similarities and differences in the ancestral composition of three Latin American cities with very different demographic histories: Mexico City, with a strong Indigenous American component, Buenos Aires City, with a strong European component, and San Juan de Puerto Rico, with a strong European as well as a relatively important African component. Despite the differences in the average genetic ancestry proportions between the three cities, our results show that the distributions of individual ancestry estimates have a similar degree of dispersion.
One limitation of the present study is that we ascertained individuals through a limited network of blood donor centers at hospitals and clinics, instead of using a population-based approach. Therefore, we cannot generalize the ancestry proportions obtained from each group of individuals to those in the region of origin of each group. A larger network of hospitals and clinics for our ascertainment, or a general population-based approach (e.g. random-digit dialing) would have given us a more precise picture of the distribution of ancestry proportions at the regional level. However, in spite of this limitation we were able to achieve our aims of describing the level of heterogeneity within the country as well as testing the reliability of genetic ancestry estimates using AIMs when comparing to grandparental origin.
In summary, our results suggest, in concordance with previous studies, that genetic epidemiological research in Latin America should take genetic ancestry into account, preferably by directly estimating it using AIMs or comparable genetic markers (e.g. GWAS data). Studies that are unable to obtain information about genetic ancestry should, at the minimum, take into consideration not only the countries and the regions of origin of all participating individuals, but also the cities from where individuals come from. As we report here, demographic variations at the local level could also affect admixture patterns, and thus confound associations. Self-reported information about grandparents' origins may be useful surrogates, especially in regions with recent immigration patterns. However this information has to be taken with extreme caution given the potential overestimation of European ancestry by self-report.
(TIF)
(TIF)
(TIF)
(XLS)
(XLSX)
(DOC)
The authors want to thank the study participants.