The Americas were the last continents to be populated by humans, and their colonization represents a very interesting chapter in our species' evolution in which important issues are still contentious or largely unknown. One difficult topic concerns the details of the early peopling of Beringia, such as for how long it was colonized before people moved into the Americas and the demography of this occupation. A recent work using mitochondrial genome (mtDNA) data presented evidence for a so called “three-stage model” consisting of a very early expansion into Beringia followed by ~20,000 years of population stability before the final entry into the Americas. However, these results are in disagreement with other recent studies using similar data and methods. Here, we reanalyze their data to check the robustness of this model and test the ability of Native American mtDNA to discriminate details of the early colonization of Beringia. We apply the Bayesian Skyline Plot approach to recover the past demographic dynamic underpinning these events using different mtDNA data sets. Our results refute the specific details of the “three-stage model”, since the early stage of expansion into Beringia followed by a long period of stasis could not be reproduced in any mtDNA data set cleaned from non-Native American haplotypes. Nevertheless, they are consistent with a moderate population bottleneck in Beringia associated with the Last Glacial Maximum followed by a strong population growth around 18,000 years ago as suggested by other recent studies. We suggest that this bottleneck erased the signals of ancient demographic history from recent Native American mtDNA pool, and conclude that the proposed early expansion and occupation of Beringia is an artifact caused by the misincorporation of non-Native American haplotypes.
Citation: Fagundes NJR, Kanitz R, Bonatto SL (2008) A Reevaluation of the Native American MtDNA Genome Diversity and Its Bearing on the Models of Early Colonization of Beringia. PLoS ONE 3(9): e3157. doi:10.1371/journal.pone.0003157
Editor: Henry Harpending, University of Utah, United States of America
Received: June 20, 2008; Accepted: August 8, 2008; Published: September 17, 2008
Copyright: © 2008 Fagundes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This study was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) grants to SLB, and by CNPq scholarships to RK and NJRF.
Competing interests: The authors have declared that no competing interests exist.
The Americas were the last continents to be settled by modern humans, most probably from northeast Asia through Beringia, the landmass that connected Asia and the Americas during periods of low sea-level . Archeological data suggest that the continent was colonized in the late Pleistocene after the Last Glacial maximum (LGM). The oldest sites for North and South America are about 14.5 ky old , possibly suggesting a fast southward movement of the initial settlers. However, the scarceness of late Pleistocene human remains in northeast Asia make it difficult to evaluate the details of the population processes in Beringia that ultimately led to the peopling of the New World.
As an alternative to the study of archeological data, molecular data have been extensively used to infer when and how modern humans colonized the world reviewed in , . Since the pioneering study by Cann et al. , the mitochondrial DNA (mtDNA) has become the most widely used genetic marker to study recent human evolution . In Native Americans, early studies of mtDNA variation found that these populations have five distinct major mtDNA haplogroups (A, B, C, D and X) , , all of Asian origin. Although most of these studies seemed to converge on a model suggesting a single pre-Clovis migration –, no consensus emerged for details such as the timing and pace of the putative occupation event under this scenario. These controversies notwithstanding, the divergence between Native American and Asian sequences for each mtDNA haplogroup led some authors to suggest that the Native American founder population stayed isolated in Beringia from the remaining Asian populations prior to their entry in the Americas , the so called “out of Beringia” model. Under this scenario, Beringia played a key-role in the differentiation of the mtDNA haplogroups presently found in Native Americans.
The study of complete mtDNA sequences from Native Americans – has allowed investigators to examine mtDNA variation in the New World with much greater resolution. The first systematic survey of coding-region mtDNA sequences including individuals from Native American ancestry was carried by Herrnstadt et al. , who studied individuals sampled from an urban population and relied on the screening of an incomplete set of HVS-I markers to identify mtDNA haplotypes as being of Native American origin . Afterwards, a thorough revision of these sequences partially changed the original classification of the Native American genomes . Further studies, using samples obtained mainly from native American populations, revealed new putative founder haplotypes , , and suggested an average coalescence time for the most common haplogroups of around 13.5 thousand years ago (kya) or 19 kya depending on the estimates being based on only synonymous transitions  or on all substitutions , respectively. The most extensive work to date  showed that all five major haplogroups have a similar pattern of genetic diversity, and that they expanded together towards the end of the last glacial maximum (LGM) around 18 kya.
The development of new analytical methods allowed the estimation of the past demography changes using the Bayesian Skyline Plot (BSP) approach, which allows inference of past population size changes without assuming any a priori demographic scenario . Thus, so far, four roughly synchronous studies , –, using partially overlapping data sets, applied such an analysis using putative Native American mtDNA genomes. Adjusting for the different substitution rates used, all of them showed evidence for quick and strong population growth in the late Pleistocene, likely near the end of the LGM, preceded by a population bottleneck that lasted for a few thousand years e.g., .
Interestingly, only one of these studies concluded that the BSP of Native American mtDNAs suggested an additional and more ancient period of population growth followed by a long period of population stability . The authors interpreted these results as representing the expansion out of Central Asia into Beringia after divergence from Asians (~43–36 kya) followed by a long period (~20,000 years) of population stability in Beringia and finally by the strong population growth stage (~16 kya) after the LGM associated to the peopling of the Americas. They called this scenario the “three-stage” colonization model for the peopling of the Americas, even though several of the more recent colonization models could likewise be described as having “three stages”: a first stage from Asia into Beringia, a second stage of isolation in Beringia and a third stage with an expansion out of Beringia into the Americas , , –, . These studies highlight the importance of a stage in Beringia prior to the peopling of the Americas, and one of them even provided a rough estimate of the time the Native American founder population spent in Beringia using the number of diagnostic substitutions found in Native American mtDNA sub-haplogroups . However, what differentiates the study of Kitchen et al.  from the others is that it was the only one that estimated a detailed demographic history for the first two stages, although it used evidence and methods (BSP) similar to those employed by the other works. Nonetheless, one possible problem with this study is that the data set of mtDNA genomes they used consisted primarily of the original data set of Herrnstadt et al. , in which several mtDNA genomes regarded as Native American are most likely of non-Native ancestry  and which also includes several sequence errors detected recently but not corrected in the mtDB database that they used , .
In this study, we provide a reanalysis of the BSP results from Kitchen et al.  using a rigorous criterion for defining Native American ancestry , ,  to determine if the specific three-stage model suggested by these authors is still supported when a more reliable data set is used. We also investigate the likely source of the early expansion detected by that study.
Materials and Methods
Initially, the original Kitchen et al.  data set (n = 77) was used to replicate their original findings using the evolutionary model specified in that report and another one (see below). Based on information detailed in Bandelt et al.  and Achilli et al. , we removed seven individuals who are very likely of Asian ancestry. These include three individuals assigned to haplogroup E in Herrnstadt et al. (Figure 2 in ), as well as four individuals belonging to the Asian sub-haplogroups B1 or B4c . Another individual (Herrn552; see ) was removed, since it actually belongs to the West European haplogroup H. This individual was probably incorporated into the data set by mistake, as individual 532 (from Native American haplogroup A2) is absent from their data set. Finally, another individual (Kiv2870) was removed, since it had all the diagnostic coding-region mutations for West European sub-haplogroup X2b (8393, 13708, 15927), and none of the diagnostic coding-region mutations for Native American sub-haplogroup X2a (8913, 12397, 14502), and thus it is likely of recent European ancestry. That modifications resulted in a new, corrected, data set of 68 sequences of likely Native American origin, and, in a third data set with only the 9 sequences which have been misincorporated in the original data set of 77 sequences analyzed by Kitchen et al. . Finally, to better test whether the anomalous early expansion seen in Kitchen et al.  BSP results could be explained by the non-Native American mtDNA genomes, we created a set of 10 alignments with nine genomes each randomly selected from the mtDNA genomes from the macrohaplogroups M and N .
Bayesian Skyline Plot
BSPs  have been constructed in the program Beast 1.4.7 (http://beast.bio.ed.ac.uk/). For all analyses, Markov Chain Monte Carlo (MCMC) samples were based on 100,000,000 generations, logging every 2,500 steps, with the first 10,000,000 generations discarded as the burn-in. All analyses were run multiple times to check for convergence. Following Fagundes et al. , we used the HKY+Γ evolutionary model, a log-normal relaxed molecular clock with a mean substitution rate of 1.26×10−8 mutations/site/year  for the complete coding sequence. The scaled effective population size was converted to the effective female population size Nef, assuming a generation time of 25 years. Importantly, assumptions about the mutation rate and the generation time will only affect the scale of the BSP, but not its shape.
Our analysis of the BSP for the original Kitchen et al. data set  reproduced their original results, which was two periods of population growth (Figure 1A) with a long stasis between them. However, the corrected data set provided evidence for a single, post-LGM, population growth (Figure 1C), in close accordance to the other BSP analyses using only mtDNA of Native American origin , , . That is, there was only a long tail of roughly constant population size between the time for most recent common ancestor (TMRCA) of each Native American haplogroup around the LGM and the sample TMRCA.
Figure 1. Bayesian Skyline Plots using different sequence sets.
BSPs estimated with 100 million MCMC iterations sampled every 2,500 steps using log-normal relaxed clock and HKY model plus gamma (eight categories) with the standard substitution rate of 1.26×10−8 sites/yr and a generation time of 25 yr. The y axis represents the female effective population size in a log scale and the x axis shows time in thousands of years ago (kya). The thicker blue lines are the median for population size and the thinner black lines represent the 95% higher posterior density (HPD) intervals. (A) BSP using the original 77 individuals from . (B) BSP for the nine misincorporated non-Native American sequences. (C) BSP for the 68 confirmed Native American haplotypes in  in blue and black; and the BSP from  in red dashed (median) and gray lines (95% HPD interval).doi:10.1371/journal.pone.0003157.g001
More strikingly, when we considered only those individuals which were removed from the original data set, the signal for the earlier expansion reappeared, despite the very low sample size in this data set (n = 9) (Figure 1B). Using our substitution rate, this expansion began ~60 kya. These results clearly show that the signal for a population expansion that they detected  in Native Americans mtDNAs much before LGM is an artifact caused by the incorporation of non-Native American haplotypes into the analysis. Since these nine haplotypes seem to be of Asian and European origin, from macrohaplogroups M and N, we conjecture that this signal of expansion may be related to the demographic expansion out-of-Africa that gave rise to Eurasians. The BSPs of the ten data sets from genomes selected from these two macrohaplogroups (Figure 2) showed a very similar expansion pattern, corroborating this hypothesis.
Figure 2. Bayesian Skyline Plot for ten replicates of nine random non-African haplotypes.
Ten BSPs using random samples of nine non-African individuals from  belonging to macrohaplogroups M and N, showing a similar pattern of expansion between ~80 and ~40 kya. All BSPs were calculated with 100 million MCMC generations sampling every 2,500 using the same model applied to the BSPs in Figure 1. Axes and lines are as in Figure 1.doi:10.1371/journal.pone.0003157.g002
Our results strongly suggest that the demographic expansion putatively associated with the geographical expansion out of Central Asia and the initial peopling of Beringia as well as the estimation of ~20 ky of occupation of Beringia by human groups before they entered the Americas  is merely a database artifact caused by the incorporation of mtDNA genomes of non-Native American ancestry in the analysis. Because the mtDNA haplogroups and sub-haplogroups have a strong and extensively studied geographic association e.g., , it is possible to identify almost unambiguously the ancestry of most haplotypes . While this may be also true for the Y-chromosome , it is certainly not for most other nuclear markers, which typically display low levels of interpopulation differentiation and extensive haplotype sharing among populations e.g., , . Possible applications of the BSP to autosomal or sex-linked haplotypes must carefully select the sampled populations to avoid incorporating into the analysis those recently introduced by admixture. Our analyses suggest that even a relatively small proportion of 12% (9/77) of “admixed” (or misassigned) haplotypes may significantly bias the overall result.
The population expansion that began 60–55 kya when non-Native American haplotypes are incorporated into the analysis most likely reflects, at least in part, the early expansion and diversification of macrohaplogroups M and N in Eurasia , which is unrelated to the specific process of the peopling of Beringia. As a consequence, Kitchen et al.'s estimation of a period of ~20 ky of population occupation in Beringia based on the time interval between the “two expansions” is meaningless in the context of the peopling of Beringia or the Americas. In addition, it is important to stress that, because the mtDNA haplogroups currently in America represent derivations of both macrohaplogroup M (C, D) and N (A, B, X) e.g., , their TMRCA reflects the TMRCA of macrohaplogroups M and N in Asia (~60 kya) . Therefore, the >40 ky of constant population size found in the corrected data sets extending from the LGM bottleneck to the past to the TMRCA of all Native American mtDNA haplogroups does not offer any detailed view of the demographic history of Native Americans before the bottleneck. The genetic bottleneck associated with human isolation in Beringia ,  may have erased from the recent non-Beringian Native American mtDNA data most of the details of its pre-Beringian demographic history. In this regard, discerning the population size changes during this period would mostly require acquiring mtDNA information of ancient samples from this time.
Interestingly, an almost identical pattern of population size change was found with the Kitchen et al. corrected data set and our previous analyses of mostly Native South American mtDNAs . These results, therefore, strongly corroborate the mtDNA scenario for the peopling of the Americas presented in Fagundes et al.  and the integrated model that we suggested elsewhere . This model suggests that the ancestral population colonized Beringia more than five thousand years before the LGM, remained isolated there during LGM, and likely experienced a population reduction and loss of genetic diversity by drift. The strong population expansion shown to have started around the end of LGM (~18 kya) probably reflects the fast migration south of the Laurentide and Cordilleran ice sheets. Taking into account that the ice-free corridor between the ice sheets had not opened completely by this time interval, and that it could not have supported a viable human population earlier than 14 kya , , these findings support a coastal route as the major pathway for the peopling of the Americas, in agreement with recent published data from a panel of STR markers  and archeological data , .
However, time estimates are dependent on the evolutionary rate used in the analysis. The mtDNA evolutionary rate that we used  has been the most extensively used estimate in studies of human evolution e.g., , , , . Nevertheless, other calibrations are available, although they are usually faster than ours , , . The use of an internal calibration  results in a rate similar to that used by Kitchen et al. , which pinpointed the post-LGM population growth at ~16 kya. Another rate, which uses only synonymous substitutions , suggests a mean coalescent age of ~14 ky for the major Native American haplogroups . However, the corroboration of the human occupation of southern Chile ~14.5 kya  strongly suggests that haplogroup coalescences and the expansion out of Beringia should have occurred >16 kya, implying that the faster rates are unlikely to be accurate. In any case, the choice of a rate affects only the absolute numeric estimates, and does not change the shape of the BSP. Thus, our refutation of the Kitchen et al.  model for the peopling of Beringia is independent of all the controversies about the correct mtDNA substitution rate e.g., –.
We would like to thank an anonymous reviewer for improving an earlier version of the manuscript.
Conceived and designed the experiments: NJRF RK SLB. Performed the experiments: NJRF RK SLB. Analyzed the data: NJRF RK SLB. Contributed reagents/materials/analysis tools: SLB. Wrote the paper: NJRF RK SLB.
- 1. Goebel T, Waters MR, O'Rourke DH (2008) The Late Pleistocene Dispersal of Modern Humans in the Americas. Science 319: 1497–1502.
- 2. Schurr T (2004) The Peopling of the New World: Perspectives from Molecular Anthropology. Annu Rev Anthropol 33: 551–583.
- 3. Salzano FM (2007) The prehistoric colonization of the Americas. In: Crawford MH, editor. Anthropological Genetics: Theory, Methods, and Applications. Cambridge: Cambridge University Press. pp. 433–455.
- 4. Cann RL, Stoneking M, Wilson AC (1987) Mitochondrial DNA and human evolution. Nature 325: 31–36.
- 5. Torroni A, Achilli A, Macaulay V, Richards M, Bandelt HJ (2006) Harvesting the Fruit of the Human MtDNA Tree. Trends in Genetics 22: 339–345.
- 6. Schurr TG, Ballinger SW, Gan YY, Hodge JA, Merriwether DA, et al. (1990) Amerindian mitochondrial DNAs have rare Asian mutations at high frequencies, suggesting they derived from four primary maternal lineages. Am J Hum Genet 46: 613–623.
- 7. Brown MD, Hosseini SH, Torroni A, Bandelt HJ, Allen JC, et al. (1998) MtDNA Haplogroup X: an Ancient Link between Europe/Western Asia and North America? Am J Hum Genet 63: 1852–1861.
- 8. Forster P, Harding R, Torroni A, Bandelt HJ (1996) Origin and Evolution of Native American MtDNA Variation: a Reappraisal. Am J Hum Genet 59: 935–945.
- 9. Bonatto SL, Salzano FM (1997) A Single and Early Migration for the Peopling of the Americas Supported by Mitochondrial DNA Sequence Data. Proc Natl Acad Sci USA 94: 1866–1871.
- 10. Bonatto SL, Salzano FM (1997) Diversity and age of the four major mtDNA haplogroups, and their implications for the peopling of the New World. Am J Hum Genet 61: 1413–1423.
- 11. Herrnstadt C, Elson JL, Fahy E, Preston G, Turnbull DM, et al. (2002) Reduced-Median-Network Analysis of Complete Mitochondrial DNA Coding-Region Sequences for the Major African, Asian, and European Haplogroups. Am J Hum Genet 70: 1152–1171.
- 12. Tamm E, Kivisild T, Reidla M, Metspalu M, Smith DG, et al. (2007) Beringian Standstill and Spread of Native American Founders. PLoS One 2: e829.
- 13. Fagundes NJR, Kanitz R, Eckert R, Valls ACS, Bogo MR, et al. (2008) Mitochondrial Population Genomics Supports a Single Pre-Clovis Origin with a Coastal Route for the Peopling of the Americas. Am J Hum Genet 82: 583–592.
- 14. Achilli A, Perego UA, Bravi CM, Coble MD, Kong QP, et al. (2008) The Phylogeny of the Four Pan-American MtDNA Haplogroups: Implications for Evolutionary and Disease Studies. PLoS One 3: e1764.
- 15. Bandelt HJ, Herrnstadt C, Yao YG, Kong QP, Rengo C, et al. (2003) Identification of Native American Founder mtDNAs Through the Analysis of Complete mtDNA Sequences: Some Caveats. Ann Hum Genet 67: 512–524.
- 16. Kivisild T, Shen P, Wall DP, Do B, Sung R, et al. (2006) The Role of Selection in the Evolution of Human Mitochondrial Genomes. Genetics 172: 373–387.
- 17. Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, et al. (2003) Natural Selection Shaped Regeional MtDNA Variation in Humans. Proc Natl Acad Sci USA 100: 171–176.
- 18. Drummond AJ, Rambaut A, Shapiro B, Pybus OG (2005) Bayesian Coalescent Inference of Past Population Dynamics from Molecular Sequences. Mol Biol Evol 22: 1185–1192.
- 19. Kitchen A, Miyamoto MM, Mulligan CJ (2008) Utility of DNA Viruses for Studying Human Host History: Case Study of JC Virus. Mol Phylogen Evol 46: 673–682.
- 20. Kitchen A, Miyamoto MM, Mulligan CJ (2008) A Three-Stage Colonization Model for the Peopling of the Americas. PLoS One 3: e1596.
- 21. Atkinson QD, Gray RD, Drummond AJ (2008) MtDNA Variation Predicts Population Size in Humans and Reveals a Major Southern Asian Chapter in Human Prehistory. Mol Biol Evol 25: 468–474.
- 22. Bandelt HJ, Kong QP, Richards M, Macaulay V (2006) Estimation of Mutation Rates and Coalescence Times: Soma Caveats. In: Bandelt HJ, Macaulay V, Richards M, editors. Human Mitochondrial DNA and the Evolution of Homo sapiens. Berlin: Springer. pp. 47–90.
- 23. Bandelt H-J, Yao Y-G, Salas A, Kivisild T, Bravi CM (2007) High penetrance of sequencing errors and interpretative shortcomings in mtDNA sequence analysis of LHON patients. Biochem Biophys Res Commun 352: 283–291.
- 24. Ingman M, Kaessmann H, Pääbo S, Gyllensten U (2000) Mitochondrial Genome Variation and the Origin of Modern Humans. Nature 408: 708–713.
- 25. The Y chromosome Consortium (2002) A Nomenclature System for the Tree of Human Y-Chromosomal Binary Haplogroups. Genome Res 12: 339–348.
- 26. Fagundes NJR, Salzano FM, Batzer MA, Deininger PL, Bonatto SL (2005) Worldwide Genetic Variation at the 3′-UTR Region of the LDLR Gene: Possible Influence of Natural Selection. Ann Hum Genet 69: 389–400.
- 27. Battilana J, Cardoso-Silva L, Barrantes R, Hill K, Hurtado AM, et al. (2007) Molecular Variability of the 16p13.3 Region in Amerindians and its Anthropological Significance. Ann Hum Genet 71: 64–76.
- 28. Macaulay V, Hill C, Achilli A, Rengo C, Clarke D, et al. (2005) Single, Rapid Coastal Settlement of Asia Revealed by Analysis of Complete Mitochondrial Genomes. Science 308: 1034–1036.
- 29. Battilana J, Fagundes NJR, Heller AH, Goldani A, Freitas LB, et al. (2006) Alu insertion polymorphisms in Native Americans and Related Asian Populations. Ann Hum Biol 33: 142–160.
- 30. Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. (2007) Statistical Evaluation of Alternative Models of Human Evolution. Proc Natl Acad Sci USA 104: 17614–17619.
- 31. Gonzales-José R, Bortolini MC, Santos FR, Bonatto SL (2008) The Peopling of America: Craniofacial Shape Variation on a Continental Scale and its Interpretation From an Interdisciplinary View. Am J Phys Anthropol. In press: doi: 1002/ajpa.20854.
- 32. Fladmark KR (1979) Routes: Alternate Migration Corridors for Early Man in North America. American Antiquity 44: 55–69.
- 33. Dixon EJ (1993) Quest for the Origins of the First Americans. Albuquerque: University of New Mexico Press.
- 34. Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, et al. (2007) Genetic Variation and Population Structure in Native Americans. PLoS Genet 3: e185.
- 35. Gilbert MTP, Jenkins DL, Götherstrom A, Naveran N, Sanchez JJ, et al. (2008) DNA from Pre-Clovis Human Coprolites in Oregon, North America. Science 320: 786–789.
- 36. Dillehay TD, Ramírez C, Pino M, Collins MB, Rossen J, et al. (2008) Monte Verde: Seaweed, Food, Medicine, and the Peopling of South America. Science 320: 784–786.
- 37. Achilli A, Rengo C, Magri C, Battaglia V, Olivieri A, et al. (2004) The Molecular Dissection of mtDNA Haplogroup H Confirms That the Franco-Cantabrian Glacial Refuge Was a Major Source for the European Gene Pool. Am J Hum Genet 75: 910–918.
- 38. Ho SYW, Phillips MJ, Cooper A, Drummond AJ (2005) Time Dependency of Molecular Rate Estimates and Systematic Overestimation of Recent Divergence Times. Mol Biol Evol 22: 1561–1568.
- 39. Ho SYW, Shapiro B, Phillips MJ, Cooper A, Drummond AJ (2007) Evidence for Time Dependency of Molecular Rate Estimates. Syst Biol 56: 515–522.
- 40. Bandelt HJ, Parson W (2008) Consistent treatment of length variants in the human mtDNA control region: a reappraisal. Int J Legal Med 122: 11–21.