Conceived and designed the experiments: AS AT HB AA. Performed the experiments: AA UP. Analyzed the data: AS AT HB CB QK AA MC UP SW. Contributed reagents/materials/analysis tools: AT. Wrote the paper: AS AT HB CB QK AA MC UP SW.
The authors have declared that no competing interests exist.
Only a limited number of complete mitochondrial genome sequences belonging to Native American haplogroups were available until recently, which left America as the continent with the least amount of information about sequence variation of entire mitochondrial DNAs. In this study, a comprehensive overview of all available complete mitochondrial DNA (mtDNA) genomes of the four pan-American haplogroups A2, B2, C1, and D1 is provided by revising the information scattered throughout GenBank and the literature, and adding 14 novel mtDNA sequences. The phylogenies of haplogroups A2, B2, C1, and D1 reveal a large number of sub-haplogroups but suggest that the ancestral Beringian population(s) contributed only six (successful) founder haplotypes to these haplogroups. The derived clades are overall starlike with coalescence times ranging from 18,000 to 21,000 years (with one exception) using the conventional calibration. The average of about 19,000 years somewhat contrasts with the corresponding lower age of about 13,500 years that was recently proposed by employing a different calibration and estimation approach. Our estimate indicates a human entry and spread of the pan-American haplogroups into the Americas right after the peak of the Last Glacial Maximum and comfortably agrees with the undisputed ages of the earliest Paleoindians in South America. In addition, the phylogenetic approach also indicates that the pathogenic status proposed for various mtDNA mutations, which actually define branches of Native American haplogroups, was based on insufficient grounds.
America was the last continent to be colonized by humans, and molecular data provided by different genetic systems
Since the early studies, the interpretation of mtDNA data has been rather controversial with scenarios postulating one to multiple migrational events from Beringia at very different times (between 11,000 and 40,000 years ago) (for a review, see
Among the novel mtDNA sequences, there are 265 from “Hispanics” and “African-Americans” that recently became available in GenBank
To define the phylogeny of A2, B2, C1, and D1 at the highest level of molecular resolution – that of complete mtDNA sequences, it is necessary to evaluate (and possibly to expand) the current data set of published mtDNA sequences in regard to reliability as well as to update and correct the nomenclature (
The tree is rooted on the haplogroup L3 founder and the position of the revised Cambridge reference sequence (rCRS)
The complete variation of all available mtDNA sequences belonging to haplogroups A2, B2, C1, and D1 is displayed in the phylogenies of
The sequencing procedure for the novel complete sequences and the phylogeny construction were performed as described elsewhere
For additional information, see the legends for
The phylogeny of haplogroup B2 (
As for haplogroup C1, all sequences appear to fall into one of the three subhaplogroups C1b, C1c, and C1d (
As for D1 (
Overall, the four phylogenies appear to be quite starlike, especially the B2 and D1 trees having high indices (∼0.5) of starlikeness (
Haplogroup | No. ( | No. of base sub-stitutions | Star-likeness | T (years) | ΔΤ (years) | ||
96 | 321+3 | 3.340 | 0.322 | 0.332 | 17,200 | 1,700 | |
86+1 | 304+3 | 3.529 | 0.348 | 0.335 | 18,100 | 1,800 | |
27+16 | 116+61 | 4.116 | 0.463 | 0.447 | 21,200 | 2,400 | |
42+13 | 198+57 | 4.636 | 0.836 | 0.121 | 23,800 | 4,300 | |
21+4 | 86+14 | 4.000 | 1.150 | 0.121 | 20,600 | 5,900 | |
15+7 | 63+23 | 3.909 | 0.695 | 0.368 | 20,100 | 3,600 | |
6+2 | 13+4 | 2.125 | 0.573 | 0.809 | 10,900 | 2,900 | |
17+17 | 67+56 | 3.618 | 0.441 | 0.547 | 18,600 | 2,300 | |
172+47 | 684+177 | 3.932 | 0.311 | 0.186 | 20,200 | 1,600 | |
172+47 | 649+161 | 3.699 | 0.274 | 0.225 | 19,000 | 1,400 |
First summand refers to the complete mtDNA sequences displayed in
The average number of base substitutions in the mtDNA coding region (between positions 577 and 16023) from the root sequence type.
Standard error calculated from an estimate of the genealogy
Starlikeness (“effective star size”
Estimate of the time to the most recent common ancestor of each cluster, using an evolutionary rate estimate of 1.26±0.08×10−8 base substitutions per nucleotide per year in the coding region
This includes one Apache A2a mtDNA (#1 in
Without A2a and A2b mtDNAs.
In some of the newly defined Native American branches, one can identify mutations for which a pathogenic role was suggested in the medical literature. The seemingly ‘detrimental’ status of mutations G3316A and G13708A, defining haplogroups A2f and A2e respectively, has already been questioned and discussed at length in the East Asian mtDNA context
An extremely interesting case of a mutational motif marking a Native American branch of the mtDNA phylogeny is represented by the T3308A transversion with a subsequent insertion of one C (3308+C) that characterize haplogroup A2i. The insertion, first reported in a patient with dystonia, leads to a frameshift mutation for which a pathogenic role was proposed
A different case is the one concerning the homoplasmic mutation T9205C detected in one mtDNA (no. 54) belonging to haplogroup A2 (
Another illustrative case of hypothesized association between mtDNA mutations and a complex disorder is represented by the G1888A transition which could play some role in the pathogenesis of Type 2 diabetes
The estimated ages (18–24 ky) of the four pan-American haplogroups A2, B2, C1, and D1 are quite similar with an average value of 20 ky. Thus, if A2, B2, C1, and D1 entered the Americas without variation in the coding region – in other words, each with only a single (successful) founder sequence (the root haplotype), their entry into the Americas would have occurred right after the peak of the Last Glacial Maximum (LGM, centered at ∼21.0 kya and extending from 19.0 to at least 23.0 kya
In any case, all the abovementioned scenarios do not support the ‘Clovis-first’ hypothesis, but are well in agreement with the undisputed ages of the earliest Paleoindians in South America
Our snapshot of the phylogenies for haplogroups A2, B2, C1, and D1 is only partially representative of Native American mtDNA variation, since most likely it only marginally includes the variation of Native American populations from Central and South America. However, despite this limitation, it is clear that one has to anticipate a pronounced starlike pattern near the root of each respective founder haplogroup/branch. The starlike pattern enhances the precision of the dating of the human entry into the Americas, but inevitably hinges upon the calibration employed and, perhaps more importantly, on a detailed founder analysis across the double-continent. Therefore it will require major sampling and sequencing efforts in the future for uncovering all of the most basal variation in the Native American mtDNA haplogroups by targeting, if possible, both the general mixed population of national states and autochthonous Native American groups, especially in Central and South America.
A widespread knowledge of the specifics for the Native American haplogroups can also prevent the publishing of effectively mutilated or distorted mtDNA sequences from complete sequencing efforts in clinical studies
The source of the sequence data (171 complete mtDNA sequences) employed for the phylogeny construction are listed in
The 101 complete mtDNA sequences
Accession numbers and URLs for data presented herein are as follows: GenBank,
Mistakes, phantom mutations and discrepancies in literature and public databases
(0.06 MB DOC)
Further information from mtDNA control-region and RFLP data
(0.08 MB DOC)
Additional information concerning mtDNA disease studies
(0.04 MB DOC)
Additional information for
(0.04 MB DOC)
Additional references
(0.04 MB DOC)
Source of the complete mtDNA sequences
(0.39 MB DOC)
We would also like to thank all the donors for providing biological specimen and the people involved in their collection.