Conceived and designed the experiments: HJJ CTS WK. Performed the experiments: HJJ. Analyzed the data: HJJ CTS WK. Contributed reagents/materials/analysis tools: WK. Wrote the paper: HJJ CTS WK.
The authors have declared that no competing interests exist.
The Koreans are generally considered a northeast Asian group because of their geographical location. However, recent findings from Y chromosome studies showed that the Korean population contains lineages from both southern and northern parts of East Asia. To understand the genetic history and relationships of Korea more fully, additional data and analyses are necessary.
We analyzed mitochondrial DNA (mtDNA) sequence variation in the hypervariable segments I and II (HVS-I and HVS-II) and haplogroup-specific mutations in coding regions in 445 individuals from seven east Asian populations (Korean, Korean-Chinese, Mongolian, Manchurian, Han (Beijing), Vietnamese and Thais). In addition, published mtDNA haplogroup data (N = 3307), mtDNA HVS-I sequences (N = 2313), Y chromosome haplogroup data (N = 1697) and Y chromosome STR data (N = 2713) were analyzed to elucidate the genetic structure of East Asian populations. All the mtDNA profiles studied here were classified into subsets of haplogroups common in East Asia, with just two exceptions. In general, the Korean mtDNA profiles revealed similarities to other northeastern Asian populations through analysis of individual haplogroup distributions, genetic distances between populations or an analysis of molecular variance, although a minor southern contribution was also suggested. Reanalysis of Y-chromosomal data confirmed both the overall similarity to other northeastern populations, and also a larger paternal contribution from southeastern populations.
The present work provides evidence that peopling of Korea can be seen as a complex process, interpreted as an early northern Asian settlement with at least one subsequent male-biased southern-to-northern migration, possibly associated with the spread of rice agriculture.
An understanding of the evolutionary history of East Asian populations has long been a subject of interest in the field of human evolutionary genetics. Based on results of classical genetic markers, there is significant separation between southern and northern populations of East Asia
The Korean Peninsula is located to the north of the Yellow and Yangtze Rivers of China, and bounded to the northeast by Russia. Therefore, the Koreans are geographically a northeast Asian group. Anthropological and archeological evidence suggests that the early Korean population was related to Mongolian ethnic groups who inhabited the general area of the Altai Mountains and Lake Baikal regions of southeast Siberia
Studies of classical genetic markers showed that Koreans tend to have a close genetic affinity with Mongolians among East Asians
To understand the genetic history of Korea better, more data from additional genetic markers from Korea and its surrounding regions are necessary. Mitochondrial DNA (mtDNA), like the Y chromosome, can also provide valuable information about the phylogeography of human populations due to its special features of haploidy and uniparental inheritance
In this study, we present new data on the mtDNA sequence variation of the hypervariable segments I and II (HVS-I and HVS-II) and haplogroup-specific mutations in coding regions in 445 individuals from seven East Asian populations, including Korea. In addition, mtDNA haplogroup data (N = 3307), mtDNA HVS-I sequences (N = 2313), Y chromosome haplogroup data (N = 1697) and Y chromosome STR data (N = 2713) from the literature were analyzed to elucidate wider aspects of the genetic structure of East Asian populations.
We analyzed a total of 445 individuals, collected from seven East Asian populations (Korean, Korean-Chinese (People of Korean origin now living in China), Mongolian, Manchurian, Chinese Han (Beijing), Vietnamese, and Thai). The DNA samples included subsets of the samples examined by Jin et al.
In addition to our mtDNA data sets, mtDNA haplogroup data for 2862 individuals, mtDNA HVS-I sequences data for 1868 individuals, Y chromosome haplogroup data for 1697 individuals and Y chromosome STR data (ten Y-STR loci: DYS19, DYS389I, DYS389b, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439) for 2716 individuals were retrieved from the literature
(A) mtDNA haplogroups. (B) mtDNA HVS-I sequences. (C) Y-chromosome haplogroups. (D) Y-chromosome STRs. aWen et al.
PCR amplification of the HVS-I and HVS-II of mtDNA control region was performed using two primer sets as described by Yao et al.
Primer Pair | Primer sequences (5′ to 3′) | Annealing Temperature (°C) | Polymorphisms at/in |
L29/H408 | 54 | HVS-II | |
L4499/H5099 | 60 | +4831 |
|
L4887/H5442 | 60 | 5176 |
|
L8215/H8297 | 57 | 9-bp deletion | |
L10170/H10660 | 59 | 10171-10659 | |
L12114/H12338 | 58 | 12308 | |
L14054/H14591 | 57 | 14178, 14308, 14318, 14470 | |
L14575/H15086 | 57 | 14668, 14766, 14783, 15043 | |
L15996/H16498 | 60 | HVS-I | |
PCR conditions were 94°C for 5 min, for denaturation; 94°C for 45 sec; annealing temperature shown for 45 sec, for amplification; and 72°C for 1 min, for 35 cycles; incubation at 72°C for 5 min.
After PCR amplification, each PCR product was purified using the Wizard® PCR Preps DNA Purification System (Promega, WI, USA) and then sequenced by cycle sequencing using either a MegaBase 1000 sequencer (Amersham Bioscience, USA) or an ABI PRISM™ 310 Genetic Analyzer (Applied Biosystems, CA, USA) with DYEnamic ET Dye Terminator (Amersham Bioscience, USA) or BigDye™ Terminator (PE Biosystems, USA), respectively. DNA sequences of the PCR amplicons were determined from both forward and reverse sequence data using the original primer pairs. The sequences from nucleotide position (np) 16024 to 16365 in HVS-I and from 73 to 340 in HVS-II were determined, since ambiguous electropherograms for 20–30 nucleotides near the primers were frequently observed.
The intergenic COII/tRNALys 9-bp deletion was analyzed as described in Jin et al.
Sequences were aligned and compared with the revised Cambridge Reference Sequence (rCRS)
The genetic differentiation between different population samples and its statistical significance were assessed via
Haplogroup-specific median-joining networks
The admixture proportions of northeast Asian and the southeast Asian parental populations in the Korean population were estimated for mtDNA and the Y chromosome using the Admix 2.0 software
Almost all of the mtDNA lineages analyzed here could be assigned to the East Asian-specific (sub)haplogroups described recently
Haplogroup | Korean-Chinese | Mongolian | Manchurian | Han (Beijing) | Vietnamese | Thais | Korean |
A | 3 | 1 | 3 | ||||
A4 | 4 | 2 | 1 | 1 | 1 | 6 | |
A5 | 1 | 1 | 5 | ||||
A5a | 1 | ||||||
B | 2 | 1 | |||||
B4 | 2 | 2 | 3 | 7 | |||
B4a | 2 | 1 | 1 | 1 | 11 | ||
B4b | 2 | 2 | 1 | ||||
B4b1 | 4 | ||||||
B4c | 1 | ||||||
B5a | 1 | 1 | 3 | 2 | |||
B5b | 1 | 1 | 2 | ||||
C | 1 | 8 | 1 | 2 | 4 | 3 | |
C3 | 2 | ||||||
D | 1 | 2 | 1 | ||||
D4 | 11 | 5 | 8 | 5 | 7 | 1 | 44 |
D4a | 3 | 2 | 3 | ||||
D4b | 1 | 3 | |||||
D5 | 2 | 1 | 3 | 1 | 6 | ||
D5a | 1 | 2 | 1 | 3 | |||
F | 1 | 3 | 3 | 1 | |||
F1a | 1 | 3 | 2 | 4 | 10 | 8 | |
F1b | 1 | 3 | 2 | 2 | 8 | 8 | |
F1c | 1 | 1 | |||||
F2 | 2 | ||||||
F2a | 1 | ||||||
G | 2 | 1 | 1 | ||||
G1a | 3 | 1 | 1 | ||||
G2 | 1 | 1 | 7 | ||||
G2a | 1 | 5 | 1 | 2 | 6 | ||
G3 | 1 | 1 | 4 | ||||
M | 3 | 1 | 1 | 5 | 1 | ||
M7a1 | 7 | ||||||
M7b | 1 | 1 | 3 | ||||
M7b1 | 2 | 2 | 1 | ||||
M7b2 | 2 | 1 | 4 | ||||
M7c | 1 | 1 | 1 | 1 | |||
M7c1 | 1 | 1 | 1 | 1 | 1 | 6 | |
M8a | 1 | 1 | 1 | 1 | 2 | ||
M9a | 1 | 1 | 1 | 3 | |||
M10 | 2 | 1 | 3 | 1 | |||
M11 | 1 | 1 | |||||
N | 1 | ||||||
N9a | 5 | 2 | 2 | 3 | 3 | 1 | 12 |
R | 2 | ||||||
R11 | 1 | ||||||
T | 1 | ||||||
U5a | 1 | ||||||
Y1 | 1 | 1 | |||||
Y2 | 1 | ||||||
pre-Z | 1 | ||||||
Z | 1 | 1 | 4 | 1 | 2 | 1 | |
n | 51 | 47 | 40 | 40 | 42 | 40 | 185 |
Total | 445 |
Haplogroup data | Sequence data (HVS-I/II |
|||
Gene diversity | Gene diversity | Pairwise difference | Nucleotide diversity | |
Korean | 0.9239+/−0.0132 | 0.9988+/−0.0007 | 10.07+/−4.62 | 0.039+/−0.020 |
Korean-Chinese | 0.9357+/−0.0219 | 0.9992+/−0.0041 | 10.21+/−4.74 | 0.039+/−0.020 |
Mongolian | 0.9454+/−0.0172 | 0.9991+/−0.0046 | 10.80+/−5.00 | 0.042+/−0.021 |
Manchurian | 0.9462+/−0.0221 | 0.9974+/−0.0063 | 10.88+/−5.05 | 0.042+/−0.022 |
Han (Beijing) | 0.9526+/−0.0135 | 1.0000+/−0.0056 | 11.38+/−5.27 | 0.044+/−0.022 |
Vietnamese | 0.9152+/−0.0290 | 0.9919+/−0.0079 | 9.66+/−4.52 | 0.037+/−0.020 |
Thai | 0.9269+/−0.0214 | 1.0000+/−0.0056 | 11.53+/−5.33 | 0.045+/−0.023 |
HVS-I: np 16024–16365; HVS-II: np 73–340.
The highest (23.8%) frequency in the Korean mtDNA pool was observed for haplogroup D4, which is widespread in northern East Asia and especially in the Korean-Chinese (21.6%), and Manchurians (20.0%). In total, haplogroup D lineages including the subhaplogroups (D4, D4a, D4b, D5, and D5a) accounted for 32.4% of the Korean mtDNA pool. In addition, the Koreans present moderate frequencies of (sub)haplogroup A (8.1%) and (sub)haplogroup G (10.3%) lineages, mostly prevalent in northeast Asia and southeast Siberia
We then investigated the mtDNA and Y-chromosomal relationships between the East Asian populations, using both the new and published data. In these analyses mtDNA haplogroups, mtDNA HVS-I sequences, Y-SNPs and Y-STRs were compared (Supplementary
(B)
In contrast, the results of Y chromosome analyses (based on Y-SNPs and Y-STRs) of Korean populations revealed closer relationships with both northeast and southeast Asian populations (Supplementary
(B) Network of 7 Y-STRs within Haplogroup O3. Circle areas are proportional to haplotype frequency. Lines represent the mutational differences between haplotypes. The network corresponds the following colors: purple- far north Asian populations (Daur, Ewenki, Han (Xinjiang), inner Mongolians, Oroqen, outer Mongolians, Uygur (Yili), Uygur (Urumqi) and Xibe); blue- Koreans; white-far south Asian populations (Buyi, Han (Guangdong), Han (Sichuan), Han (Yunnan), Hani, Indonesians, Li, philippines, She, Thais, Vietnamese, Yao (Bama) and Yao (Liannan).
The genetic differences between the Koreans and other East Asians were examined by AMOVA (
Markers | Grouping | Percentage of Variance ( |
||
Among groups | Among population within groups | Within populations | ||
mtDNA haplogroups | Korean vs. NEAs | −0.03 (0.20332) | 1.32 (<0.00001) | 98.71 (<0.00001) |
Korean vs. SEAs | 2.29 (<0.00001) | 2.47 (<0.00001) | 95.23 (<0.00001) | |
Korean vs. NEAs vs. SEAs | 2.16 (<0.00001) | 2.51 (<0.00001) | 95.33 (<0.00001) | |
NEAs vs. SEAs | 3.22 (<0.00001) | 2.98 (<0.00001) | 93.81 (<0.00001) | |
mtDNA HVRI sequences | Korean vs. NEAs | −0.39 (0.98436) | 1.29 (<0.00001) | 99.11 (<0.00001) |
Korean vs. SEAs | −0.23 (0.66373) | 1.72 (<0.00001) | 98.51 (<0.00001) | |
Korean vs. NEAs vs. SEAs | 2.18 (<0.00001) | 2.04 (<0.00001) | 95.79 (<0.00001) | |
NEAs vs. SEAs | 0.26 (<0.00001) | 1.58 (<0.00001) | 98.16 (<0.00001) | |
Y-chromosome haplogroups | Korean vs. NEAs | −0.21 (0.46237) | 9.43 (<0.00001) | 90.78 (<0.00001) |
Korean vs. SEAs | 1.34 (0.17889) | 10.78 (<0.00001) | 87.89 (<0.00001) | |
Korean vs. NEAs vs. SEAs | 2.89 (<0.00001) | 10.35 (<0.00001) | 86.77 (<0.00001) | |
NEAs vs. SEAs | 4.60 (<0.00001) | 10.95 (<0.00001) | 84.45 (<0.00001) | |
Y-chromosome STRs | Korean vs. NEAs | 3.36 (0.08016) | 6.25 (<0.00001) | 90.39 (<0.00001) |
Korean vs. SEAs | 7.58 (0.00293) | 2.48 (<0.00001) | 89.94 (<0.00001) | |
Korean vs. NEAs vs. SEAs | 4.99 (0.00098) | 5.65 (<0.00001) | 89.36 (<0.00001) | |
NEAs vs. SEAs | 5.40 (0.00098) | 6.82 (<0.00001) | 87.78 (<0.00001) |
Our study documents the genetic relationships of the Koreans with their neighboring populations in unprecedented detail. Two major findings emerge. First, the Koreans are overall more similar to northeast Asians than to southeast Asians. This conclusion would be expected from the general correlation between genetic variation and geography observed for human populations, and is supported here by an examination of individual mtDNA haplogroups (
Markers | Parental contributions | |
Northeast Asians (SD |
Southeast Asians (SD |
|
MtDNA haplogroups | 0.65 (0.25) | 0.35 (0.25) |
Y-chromosome haplogroups | 0.17 (0.14) | 0.83 (0.14) |
Mt-HG & Y-HG | 0.48 (0.21) | 0.52 (0.21) |
Standard Deviation.
The predominant genetic relationship with northern East Asians is consistent with other lines of evidence. Xue et al.
What could be the origin of the male-biased southern contribution to Korean gene pool illustrated, for example, by haplogroups O-M122 (42.2%) and O-SRY465 (20.1%)
Asian populations studied
(0.05 MB XLS)
mtDNA-haplogroup distributions in East Asian populations
(0.05 MB XLS)
Y-haplogroup distribution in East Asian populations
(0.03 MB XLS)
(0.22 MB XLS)
(0.07 MB XLS)
(0.05 MB XLS)
(0.05 MB XLS)
We would like to thank H.J. Bandelt for crucial comments and advice on the mtDNA study. We are grateful to all volunteers for providing DNA samples. We also thank K.D. Kwak and S.B. Hong for technical assistance.