Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Powerful Haplotype-Based Hardy-Weinberg Equilibrium Tests for Tightly Linked Loci

  • Wei-Gao Mao,

    Affiliation Department of Biostatistics, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China

  • Hai-Qiang He,

    Affiliation Department of Biostatistics, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China

  • Yan Xu,

    Affiliation Department of Biostatistics, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China

  • Ping-Yan Chen,

    Affiliation Department of Biostatistics, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China

  • Ji-Yuan Zhou

    zhoujiyuan@gmail.com

    Affiliation Department of Biostatistics, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, Guangdong, China

Abstract

Recently, there have been many case-control studies proposed to test for association between haplotypes and disease, which require the Hardy-Weinberg equilibrium (HWE) assumption of haplotype frequencies. As such, haplotype inference of unphased genotypes and development of haplotype-based HWE tests are crucial prior to fine mapping. The goodness-of-fit test is a frequently-used method to test for HWE for multiple tightly-linked loci. However, its degrees of freedom dramatically increase with the increase of the number of loci, which may lack the test power. Therefore, in this paper, to improve the test power for haplotype-based HWE, we first write out two likelihood functions of the observed data based on the Niu's model (NM) and inbreeding model (IM), respectively, which can cause the departure from HWE. Then, we use two expectation-maximization algorithms and one expectation-conditional-maximization algorithm to estimate the model parameters under the HWE, IM and NM models, respectively. Finally, we propose the likelihood ratio tests LRT and LRT for haplotype-based HWE under the NM and IM models, respectively. We simulate the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results show that both of the tests control the type I error rates well in testing for haplotype-based HWE. If the NM model is true, then LRT is more powerful. While, if the true model is the IM model, then LRT has better performance in power. Under the population stratification model, LRT is still more powerful. To this end, LRT is generally recommended. Application of the proposed methods to a rheumatoid arthritis data set further illustrates their utility for real data analysis.

Introduction

In studies of genetic epidemiology, complex diseases are often associated with multiple (interacting) markers [1][3]. As such, haplotype-based analysis has gained increasing attention as it can potentially be more efficient than a single-marker-based analysis [4][9]. Therefore, haplotype inference of unphased genotypes may be expected to play an important role in disease fine mapping [10]. Nowadays, there are many statistical and computational methods available for inferring haplotypes based on different types of data, such as unrelated individuals. One of the popular approaches is the likelihood method, and the maximum likelihood estimation via the expectation-maximization (EM) algorithm [11] is a frequently employed method for haplotype inference. For genotype data of unrelated individuals, an EM-based maximum likelihood method for the estimation of haplotype frequencies was first proposed by Excoffier and Slatkin [12]. We call it EM algorithm in this paper for easy description later. However, the EM algorithm needs the assumption that the population under study is in Hardy-Weinberg equilibrium (HWE), otherwise the estimates of haplotype frequencies may be biased.

Recently, there have been many case-control studies proposed to test for association between haplotypes and disease. The likelihood ratio test (LRT) was constructed from the maximum likelihood functions for cases, controls and the pooled data of cases and controls, to test for haplotype-disease association, which requires the assumption of HWE in the pooled sample data [3]. Prospective likelihood methods based on logistic regression or generalized linear models were investigated by Schaid et al. [13], Stram et al. [14], Zaykin et al. [15], and others. These methods treat unobserved haplotypes as covariates in a regression model and compute the conditional expectation of the covariates given genotype observations under the null hypothesis of no association with a HWE assumption in the pooled sample of cases and controls. Zhao et al. [16] proposed a prospective estimating-equation approach for the assessment of disease association with haplotypes when adjustment for covariates, which needs the HWE assumption of haplotype frequencies only in the control sample. The pooled sample of cases and controls is not necessarily in HWE. On the other hand, a retrospective likelihood method can be used in detecting haplotype-disease association in a case-control study and also requires HWE only in the control population [17]. Therefore, the detection of haplotype-based HWE is crucial prior to fine mapping and positional cloning studies for case-control designs.

The goodness-of-fit test is a frequently-used method to test for HWE for multiple tightly-linked loci. However, when the number of loci under study increases, the degrees of freedom dramatically increase, which may lack the test power. As such, in this paper, to investigate more powerful haplotype-based HWE tests, we first recall three models which can cause Hardy-Weinberg disequilibrium (HWD). One was proposed originally by Niu et al. [6], which includes a parameter and is called Niu's model (NM) in this paper for convenience; the second one is the inbreeding model (IM) with incorporating the inbreeding coefficient [18]; the third one is a population stratification (PS) model, which can also lead to HWD. Then, we write out two likelihood functions of the observed data based on the NM and IM models, respectively. We develop an expectation-conditional-maximization (ECM) algorithm [19] for the NM model to estimate the parameter and haplotype frequencies and suggest an EM algorithm for the IM model (denoted by IEM algorithm here) to estimate the inbreeding coefficient and haplotype frequencies. Note that or means that HWE holds. So, we further propose two LRT tests LRT and LRT to test for haplotype-based HWE under the NM and IM models, respectively. We simulate the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results show that both of the tests control the size well in testing for haplotype-based HWE. If the Niu's model is true, then LRT is more powerful. While, if the inbreeding model is true, then LRT has better performance in power. Under the population stratification model, LRT is still more powerful. Therefore, LRT is generally recommended. In addition, we obtain the sum of absolute differences (SAD) between the true and estimated haplotype frequencies [20], and compare the performance of the EM, ECM and IEM algorithms in estimating the haplotype frequencies. If the true model is the Niu's model, then the ECM algorithm has more accurate estimates of haplotype frequencies than the EM and IEM estimates. However, for all the other simulation settings, the EM algorithm is not so much affected by the departure from HWE, and the EM and IEM algorithms almost have the same performance in controlling SAD, which is less than the ECM estimates. Application of the proposed methods to the Rheumatoid Arthritis (RA) data set from the North American Rheumatoid Arthritis Consortium (NARAC) further illustrates their utility for real data analysis.

Materials and Methods

Likelihood Function and EM Algorithm under HWE

Consider a sample of unrelated individuals and single nucleotide polymorphism (SNP) markers. Assume that the SNPs are tightly linked so that the recombination fraction between any SNP pair is zero. For each SNP, there are two alleles 1 and 2. Let be the set of all possible haplotypes at these loci, where . We assume that is the frequency of haplotype (), so the set of haplotype frequencies can be denoted by . Let be the set of the observed genotypes of all the individuals, where is the genotype of the individual. For the individual, the number of haplotype combinations compatible with is . Therefore, the likelihood function of the sample can be expressed as(1)where denotes the haplotype combination compatible with genotype for the individual.

To make the haplotype frequency estimation easy and feasible, the EM algorithm was employed [11]. Let be the true haplotype combinations of the sample which are actually unobserved, and is the true haplotype combination of the individual. Then the log-likelihood function of the complete data is(2)where is an indicator function and if and 0 otherwise. Note that under HWE, the probability of unordered haplotype pair is if and otherwise. Further, Excoffier and Slatkin [12] proposed the following EM algorithm to obtain the maximum likelihood estimates of () at iteration ,where is the number of times that haplotype occurs in the haplotype combination for the individual and takes values of 0, 1 or 2, and is the value of the probability based on the estimated haplotype frequencies at iteration .

Two Forms of HWD

Note that the underlying assumption of HWE is strong and HWE does not hold usually. One may consider the following form of HWD,(3)where is the inbreeding coefficient which is generally positive [21]. Note that Equation (3) is reduced to HWE when . We denote this form of HWD as “inbreeding model (IM)” for convenient description in this paper.

Another form of the departure from HWE was originally proposed by Niu et al. [6] as follows. Assume that the probability of unordered haplotype pair is proportional to if and otherwise, with two parameters and . Obviously, the HWE assumption holds if . Note that the sum of all these terms for all the haplotypes at the loci may not be 1. Then, HWD can be defined as the following form:(4)where

Let . Then, we assume due to the positive inbreeding coefficient . We denote this form of HWD as “Niu's model (NM)” for convenience.

Likelihood Function and Haplotype-Based HWE Test under Niu's Model

Using Equations (2) and (4), the log-likelihood function of the complete data under the Niu's model can be expressed as(5)where . In fact, there is only one additional parameter included in Equation (5), compared to the likelihood function under HWE. So, we propose the following expectation-conditional-maximization (ECM) algorithm to estimate the haplotype frequencies and the parameter . It consists of one expectation step (E-step) and conditional-maximization steps (CM-steps) at each iteration. In E-step at iteration , we can get the following function after taking the conditional expectation of Equation (5), given the observed genotype data and current estimate of ,(6)where is the conditional probability of the haplotype pair given and , which is 0 if there is no haplotype pair compatible with genotype .

In CM-steps, we maximize the function in Equation (6) to estimate . Let be the estimate of in the CM-step among CM-steps at iteration . The detailed CM-steps are as follows:

• Give the initial value , where .

• At iteration , by fixing in the first CM-step,   maximize the function by taking the first-order derivation   with respect to so as to get the estimate of , and thenwhere , . So, .

• Note that there is a constraint condition  when we maximize to estimate the haplotype  frequencies . Thus, from the second CM-step to the CM- step, 's () are estimated step by step and is then  estimated by . Let be  the set of the haplotype frequency estimates for all the  haplotypes but and in the CM-step. Then, . For exam ple, in the second CM-step for estimating . As such, in the CM-step (), by maximizing , it is shown in Text S1 that a cubic equation with respect to is obtained,(7)where the coefficients , , and are, respectively,and the vector and the matrix are respectively

Moreover, the cubic equation above is alway solvable, and its solution can be obtained by Shengjin's formulas [22]. Note that the likelihood function converges no matter which initial values of are chosen. So, if there are two or three solutions between 0 and 1, then we can choose the solution which is closer to in the former step. After this step,

• For , . Then .

• Repeat the steps above until the observed log-likelihood   function of Equation (1) converges.

Equation (1) can be written to be under the Niu's Model. Note that HWE holds when and HWE is violated otherwise. Therefore, a likelihood ratio test (LRT) for HWE is naturally constructed based on the estimated haplotype frequencies as follows,(8)where and are the values of the observed likelihood function under the null hypothesis of HWE and under the HWD alternative, respectively. Obviously, this LRT statistic asymptotically follows a Chi-square distribution with the degree of freedom being 1 when HWE holds.

Likelihood Function and Haplotype-Based HWE Test under Inbreeding Model

Borrowing the idea of Zeng and Lin on how to estimate the haplotype frequencies based on case-control data for testing for association [18], here we rewrite the likelihood function for unrelated individuals under study and then propose a haplotype-based HWE test under the inbreeding model. Let be a random variable, which takes values from possible haplotype combinations compatible with of the individual. Suppose that , and is a Bernoulli variable with success probability . Let and , where and are discrete random variables, and the haplotype before “/” is paternal and haplotype after “/” is maternal. So, has the same distribution as , and we treat , and as missing. Then, the log-likelihood function of the complete data under the inbreeding model is(9)where .

To estimate the parameters in Equation (9), the EM algorithm is considered. In E-step, the function is

In M-step, the estimation of at iteration can be obtained by solving the following equation

So, can be estimated bywhere and are the estimates of and at iteration , respectively. The haplotype frequencies can be estimated bywhere is a normalizing constant, and and can be calculated as follows,

We call this process IEM algorithm for distinguishing it from the previous EM algorithm under HWE.

Note that under the IM model, HWE holds when , and HWE is not true when . Therefore, we propose the following LRT to test for haplotype-based HWE,where and are the values of the observed likelihood function under the null hypothesis of HWE and under the HWD alternative, respectively. Obviously, this LRT statistic asymptotically follows a Chi-square distribution with the degree of freedom being 1 when HWE holds.

Software Implementation

Based on the above EM, ECM and IEM algorithms, we have written a software HAP-HWE to conduct the proposed haplotype-based HWE tests, which is implemented in R (http://www.r-project.org) and is freely available at http://www.echobelt.org/web/UploadFiles/HAP-HWE.html. For each of the EM, ECM and IEM algorithms, let denote the number of haplotypes that occur in all the possible haplotype combinations compatible with the observed genotypes in the sample. As such, the initial values of all these haplotype frequencies are taken as at . For the ECM and IEM algorithms, the initial values of and are taken as 1 and 0.01, respectively. The convergence criterion is that the absolute difference between the estimated values of the log-likelihood function at two consecutive iterations is smaller than . The default maximum number of iterations is 1000. Then, the last estimates, , and , are taken as the maximum likelihood estimates of , and , respectively. Consequently, the values of LRT and LRT and the corresponding P values are obtained.

The input data file is a standard linkage pedigree file containing pedigree relationship, genotype and phenotype information, with each row being for an individual. The HAP-HWE software will only use the founders in the sample and automatically exclude the nonfounders from the analysis. Further, a haplotype block file is needed with each row representing a haplotype block, which can be easily exported from other existing software, such as Haploview [23]. Then, our HAP-HWE software will analyze the haplotype blocks one by one. The usage of the HAP-HWE software and other details refer to Text S2.

Our HAP-HWE software outputs: (i) the convergence processes of the log-likelihood function under the EM, ECM and IEM algorithms, (ii) the haplotypes with frequency estimates being larger than and the associated frequency estimates under the three algorithms, (iii) the estimated value of , the value of LRT and the corresponding P value under the Niu's model, and (iv) the estimated value of , the value of LRT and the corresponding P value under the inbreeding model. The output results will be saved in a text file (named “results.txt”) in the working directory. In addition, like other haplotype frequency estimation methods, our methods also face running time and storage space problems because of the large number of possible haplotypes. In our software, to reduce storage space, each haplotype is represented by an integer, rather than a vector of alleles.

Results

Simulation Settings

To assess the validity and compare the performance of two LRT tests in testing for haplotype-based HWE, we consider three models with three tightly-linked SNPs that can lead to HWD: Niu's model (NM), inbreeding model (IM) and population stratification (PS) model. For both the NM and IM models, the true marginal haplotype distribution is given in Table 1. For the NM model, the value of is taken from 1.0 to 1.5 in increments of 0.05. Firstly, we calculate the probabilities of all the haplotype combinations from Equation (4). Then, one haplotype combination for each individual is randomly chosen. For the IM model, the inbreeding coefficient is taken from 0 to 0.1 in increments of 0.01. Firstly, we calculate the probabilities of all the haplotype combinations from Equation (3), and then one haplotype combination is selected at random for each individual. Finally, we combine these two haplotypes to form the unphased genotype for the individual. To investigate how the population admixture affects the performance of two haplotype-based HWE tests, we consider the following PS model with two subpopulations I and II, where the corresponding haplotype distributions are given in Table 2, respectively. The proportion of the subpopulation I is taken to be 0.6 and 0.8.

thumbnail
Table 1. Haplotype distribution for Niu's model and inbreeding model.

https://doi.org/10.1371/journal.pone.0077399.t001

thumbnail
Table 2. Haplotype distribution for population stratification model.

https://doi.org/10.1371/journal.pone.0077399.t002

Note that when and , HWE holds for the NM and IM models, respectively. So, we simulate the type I error rates of the proposed HWE tests when or , and make power comparison when and . The PS model is also used to simulate the powers of both of the tests. For all the models, we generate samples of unrelated individuals at these three loci and the sample size is taken as 500, 1000 and 1500, respectively. The number of simulation replicates is fixed at 1000 and the significance level is taken to be 5%.

As additional findings in this paper, we can also compare the efficiency of the EM, ECM and IEM algorithms in haplotype inference. The accuracy of haplotype frequency estimates is assessed by the sum of absolute differences (SAD) between the true and estimated frequencies, which was proposed by Fallin and Schork [20] and defined aswhere and are the true and estimated haplotype frequencies of , respectively. It ranges from 0 (when the estimation is perfect) to 1.

Simulation Results

Table 3 lists the estimate of , mean SAD of haplotype frequency estimates, simulated size and powers of two HWE tests for different values of and different sample sizes under the Niu's model. It is shown in the table that the mean estimated value over 1000 replicates is close to its true value. The type I error rate of LRT is close to the nominal 5% level, while the size result of LRT is less than 0.05, when (i.e. HWE holds). This means that in testing for haplotype-based HWE, LRT controls the size well and LRT is conservative under the NM model. The powers of both LRT and LRT are larger when increases from 1.1 to 1.5 and the sample size is fixed. However, LRT is more powerful than LRT. In addition, when and is unchanged, the EM, ECM and IEM algorithms perform similarly in the estimation of haplotype frequencies. However, with the increase of the value, the SAD measure of the ECM algorithm does not have much change and is much smaller than the EM and IEM algorithms. The SADs of the EM and IEM algorithms are very close to each other and become larger when is larger. On the other hand, with the sample size increasing, the SAD measures of all the three algorithms become less and two proposed LRT tests have more powers.

thumbnail
Table 3. Mean and standard deviation (SD) of and estimates, mean of sum of absolute differences (SAD) of haplotype frequency estimates for EM, ECM and IEM algorithms, simulated size and powers of two HWE tests for different values of and , under Niu's model.

https://doi.org/10.1371/journal.pone.0077399.t003

Table 4 shows the estimate of , mean SAD of haplotype frequency estimates, simulated size and powers of two HWE tests for different values of inbreeding coefficient and different sample sizes under the inbreeding model. We can see from the table that the mean estimated value over 1000 replicates is close to its true value. As shown in Table 3, LRT performs better in controlling the size than LRT under the IM model. However, LRT is more powerful than LRT under this situation. On the other hand, both the EM and IEM algorithms have the same performance and the corresponding SADs are stable across different values taken for (0 to 0.1) in the estimation of haplotype frequencies. However, the ECM estimate gets larger with the increase of and performs worse than the EM and IEM estimates. When the sample size is larger, the corresponding SADs appear to be smaller and two proposed LRT tests are more powerful.

thumbnail
Table 4. Mean and standard deviation (SD) of and estimates, mean of sum of absolute differences (SAD) of haplotype frequency estimates for EM, ECM and IEM algorithms, simulated size and powers of two HWE tests for different values of and , under inbreeding model.

https://doi.org/10.1371/journal.pone.0077399.t004

Table 5 displays the mean SAD of haplotype frequency estimates and simulated powers of two HWE tests based on 1000 simulation replicates, under the PS model, with the proportion of subpopulation I being taken as 0.6 and 0.8, and the sample size being fixed at 500, 1000 and 1500. From the table, we find that LRT is more powerful than LRT, irrespective of the value or the sample size . In the estimation of haplotype frequencies, the EM and IEM algorithms perform similarly in SAD and have better SADs than the ECM estimate, which signifies that the EM and IEM algorithm are more robust to population stratification than the ECM algorithm.

thumbnail
Table 5. Mean and standard deviation (SD) of and estimates, mean of sum of absolute differences (SAD) of haplotype frequency estimates for EM, ECM and IEM algorithms, power comparison of two HWE tests under population stratification model, with the proportion of subpopulation I being taken as 0.6 and 0.8, and the sample size being fixed at 500, 1000 and 1500.

https://doi.org/10.1371/journal.pone.0077399.t005

Application to NARAC Data Set

We apply our HAP-HWE software to the Rheumatoid Arthritis (RA) data set from the North American Rheumatoid Arthritis Consortium (NARAC) [24], which was made available through the Genetic Analysis Workshop 15 [25]. In the data set, there are 757 pedigrees comprised of 8017 individuals (2481 founders and 5536 nonfounders), which were genotyped at 5407 SNP markers over the 22 autosomes. In each pedigree, there is at least one affected nonfounder with RA.

Note that information on haplotype blocks is needed prior to the HAP-HWE analysis. In this application, we use the existing software Haploview (version 4.2) [23] to define haplotype blocks, with all the arguments being taken as the default values. Then, 181 haplotype blocks are identified, 150 blocks including 2 SNPs, 19 blocks including 3 SNPs, 7 blocks including 4 SNPs, 1 block including 5 SNPs, 2 blocks including 6 SNPs, 1 block including 9 SNPs and 1 block including 13 SNPs.

On the other hand, HAP-HWE only uses the founders and excludes the nonfounders from the analysis. Further, there is a large proportion of missing genotypes for individuals in the data set. Therefore, the reduced data set used for the HAP-HWE analysis contains only a few founders in the data set. On the average, there are about 295 pedigrees (about 367 unrelated individuals) used for each haplotype block, ranging from 288 to 296 (ranging from 358 to 369).

Table 6 lists the results of the application to the NARAC data set. The significance level is fixed at . There are 13 haplotype blocks (out of 181) with at least one of the P values of the LRT and LRT being less than 5%. However, after multiple testing based on Bonferroni correction (), only the seventh haplotype block including 6 SNPs (rs347117, rs383902, rs395601, rs387812, rs347115 and rs610877) on chromosome 15 is statistically significant with the P value of the LRT being . Figure 1 gives the Haploview LD display for this haplotype block. On the other hand, Min et al. [26] reported that chromosome 15p34 at rs347117 showed a possible linkage peak to RA by using the nonparametric linkage score (), which may support our finding.

thumbnail
Figure 1. Haplotype LD display for the seventh haplotype block on chromosome 15.

The red box denotes that the LOD value between any two loci is larger than or equal to 2.0. The numbers in the red boxes are the corresponding values of and the empty box denotes that .

https://doi.org/10.1371/journal.pone.0077399.g001

thumbnail
Table 6. Results of application to North American Rheumatoid Arthritis Consortium data set.

https://doi.org/10.1371/journal.pone.0077399.t006

Discussion

In this paper, we first wrote out two likelihood functions of the observed data based on the NM model and IM model. Then, we developed the ECM algorithm for the NM model to estimate the parameter and haplotype frequencies and suggested the IEM algorithm for the IM model to estimate the inbreeding coefficient and haplotype frequencies. Note that or means that HWE holds. So, we further proposed two LRT tests to test for haplotype-based HWE. We simulated the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results showed that both of the two tests are valid in testing for the haplotype-based HWE. If the Niu's model is true, then LRT is more powerful. While, if the inbreeding model is true, then LRT has better performance in power. Under the population stratification model, LRT is still more powerful. Therefore, if the population model is unknown in practice, LRT is generally recommended due to its good performance. Furthermore, we compared the performance of the EM, ECM and IEM algorithms in estimating the haplotype frequencies. If the true model is the Niu's model, then the ECM algorithm has more accurate estimates of haplotype frequencies than the EM and IEM estimates. However, for all the other simulation settings, the EM algorithm is not so much affected by the departure from HWE, and the EM and IEM algorithms almost have the same performance in controlling SAD, which is less than the ECM estimates. We also demonstrate the practical utility of the proposed methods by the application to the Rheumatoid Arthritis (RA) data set from the North American Rheumatoid Arthritis Consortium (NARAC). In addition, note that there are many abbreviations and notations used in this paper. So, in Supporting Information, we give two tables (Tables S1 and S2) to list them for the easy reference.

Supporting Information

Text S1.

Conditional-maximization steps of ECM algorithm.

https://doi.org/10.1371/journal.pone.0077399.s003

(PDF)

Author Contributions

Conceived and designed the experiments: WGM JYZ. Performed the experiments: WGM HQH JYZ. Analyzed the data: WGM HQH YX PYC JYZ. Contributed reagents/materials/analysis tools: WGM HQH JYZ. Wrote the paper: WGM JYZ. Designed the software used in analysis: WGM JYZ. Revised the manuscript: YX PYC.

References

  1. 1. Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, et al. (2003) The international HapMap project. Nature 426: 789–796.
  2. 2. Gibbs RA, Belmont JW, Boudreau A, Leal SM, Hardenbol P, et al. (2005) A haplotype map of the human genome. Nature 437: 1299–1320.
  3. 3. Zheng G, Yang Y, Zhu X, Elston RC (2012) Analysis of Genetic Association Studies. New York: Springer.
  4. 4. Dawson E, Abecasis GR, Bumpstead S, Chen Y, Hunt S, et al. (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature 418: 544–548.
  5. 5. Huang BE, Amos CI, Lin DY (2007) Detecting haplotype effects in genomewide association studies. Genet Epidemiol 31: 803–812.
  6. 6. Niu T, Qin ZS, Xu X, Liu JS (2002) Bayesian haplotype inference for multiple linked single- nucleotide polymorphisms. Am J Hum Genet 70: 157–169.
  7. 7. Yu Z, Schaid DJ (2007) Sequential haplotype scan methods for association analysis. Genet Epidemiol 31: 553–564.
  8. 8. Zhang K, Zhao H (2006) A comparison of several methods for haplotype frequency estimation and haplotype reconstruction for tightly linked markers from general pedigrees. Genet Epidemiol 30: 423–437.
  9. 9. Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, et al. (2000) Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet 67: 936–946.
  10. 10. Becker T, Knapp M (2004) Maximum-likelihood estimation of haplotype frequencies in nuclear families. Genet Epidemiol 27: 21–32.
  11. 11. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological) 39: 1–38.
  12. 12. Excoffer L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12: 921–927.
  13. 13. Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70: 425–434.
  14. 14. Stram DO, Pearce L, Bretsky P, Freedman M, Hirschhorn JN, et al. (2003) Modeling and EM estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum Hered 55: 179–190.
  15. 15. Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, et al. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53: 79–91.
  16. 16. Zhao LP, Li SS, Khalid N (2003) A method for the assessment of disease associations with singlenucleotide polymorphism haplotypes and environmental variables in case-control studies. Am J Hum Genet 72: 1231–1250.
  17. 17. Epstein MP, Satten GA (2003) Inference on haplotype effects in case-control studies using unphased genotype data. Am J Hum Genet 73: 1316–1329.
  18. 18. Zeng D, Lin DY (2005) Estimating haplotype-disease associations with pooled genotype data. Genet Epidemiol 28: 70–82.
  19. 19. Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80: 267–278.
  20. 20. Fallin D, Schork NJ (2000) Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. Am J Hum Genet 67: 947–959.
  21. 21. Kuk AYC, Zhang H, Yang Y (2009) Computationally feasible estimation of haplotype frequencies from pooled DNA with and without Hardy-Weinberg equilibrium. Bioinformatics 25: 379–386.
  22. 22. Fan S (1989) A new extracting formula and a new distinguishing means on the one variable cubic equation (in Chinese). Natural Science Journal of Hainan Teachers College (in China) 2: 91–98.
  23. 23. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265.
  24. 24. Jawaheer D, Seldin MF, Amos CI, Chen WV, Shigeta R, et al. (2003) Screening the genome for rheumatoid arthritis susceptibility genes: a replication study and combined analysis of 512 multicase families. Arthritis Rheum 48: 906–916.
  25. 25. Amos CI, Chen WV, Remmers E, Siminovitch K, Seldin MF, et al.. (2007) Data for Genetic Analysis Workshop (GAW) 15 Problem 2, genetic causes of rheumatoid arthritis and associated traits. BMC Proc (Suppl 1): S3.
  26. 26. Min JY, Min KB, Sung J, Cho SI (2010) Linkage and association studies of joint morbidity from rheumatoid arthritis. J Rheumatol 37: 291–295.