Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Functional Mapping of Dynamic Traits with Robust t-Distribution

  • Cen Wu,

    Affiliation Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America

  • Gengxin Li,

    Affiliation Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America

  • Jun Zhu,

    Affiliation College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, Zhejiang, People's Republic of China

  • Yuehua Cui

    cui@stt.msu.edu

    Affiliation Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, United States of America

Abstract

Functional mapping has been a powerful tool in mapping quantitative trait loci (QTL) underlying dynamic traits of agricultural or biomedical interest. In functional mapping, multivariate normality is often assumed for the underlying data distribution, partially due to the ease of parameter estimation. The normality assumption however could be easily violated in real applications due to various reasons such as heavy tails or extreme observations. Departure from normality has negative effect on testing power and inference for QTL identification. In this work, we relax the normality assumption and propose a robust multivariate -distribution mapping framework for QTL identification in functional mapping. Simulation studies show increased mapping power and precision with the distribution than that of a normal distribution. The utility of the method is demonstrated through a real data analysis.

Introduction

Since the seminal work of interval mapping [1], quantitative trait loci (QTL) mapping with molecular markers has been a standard means in targeting genetic regions harboring potential genes of interest underlying various traits of interest in biomedical and agricultural research. TL mapping originated for single trait analysis, then later was considered for multiple traits for the improvement of mapping precision and power (e.g., [2]). When a trait is measured through many developmental stages, e.g., body height measured over many time points, the trait reveals the dynamic expression of the underlying genes that are associated with the trait. These traits, which can be expressed as a function of time, were termed “function-valued traits” by Pletcher and Geyer [3] or “infinite-dimensional characters” by Kirkpatrick and Heckman [4]. Mapping QTLs or genes underlying the dynamics of a developmental characteristic has been a longstanding challenging topic in genetic mapping. Recently, Wu and his colleagues (e.g., [4][6]) have developed a series of mapping approaches for dynamic traits by integrating mathematical functions into a QTL mapping framework, opening a new era for genetic mapping. The so-called functional mapping approach enables one to propose either parametric or non-parametric functions to model the developmental mean function of a dynamic trait. By testing mean differences for different QTL genotype categories in a genome-wide linkage scan, one can identify potential genes that govern the dynamics of a trait.

In general, functional mapping assumes a joint multivariate normal distribution of a developmental trait. The mean of the multivariate normal is modeled through functions of time, and trait correlations among different developmental stages are fully considered. These treatments make functional mapping more powerful than single trait analysis for a developmental trait [4]. The multivariate normality assumption is commonly assumed for all the methods developed for functional mapping in the literature. In real data analysis, this assumption could be easily violated as in the case for single trait analysis [8]. In a single trait analysis, von Rohr and Hoeschele [8] showed that deviations from normality may lead to false positive QTL detection. The authors proposed to replace the normality assumption with the -distribution to allow for heavy tails and skewness of a trait distribution. In human linkage analysis with the variance components model, Peng and Siegmund [9] also showed that departure from multivariate normality for the trait vector could dramatically reduce the mapping power when multivariate normality is assumed. As an alternative, the authors proposed to substitute the multivariate normal with a multivariate -distribution and showed great power improvement.

For a developmental trait, the multivariate normality assumption is often a concern, especially for a small sample size. For many applied problems, the tails of the data distribution are often longer than a normal distribution assumes. In the presence of extreme observations, statistical inference based on the normal distribution is less robust. This could lead to low power or false positives under a functional mapping framework. The lack of robustness with respect to outliers and heavy tails that results from using a Gaussian model makes the multivariate -distribution a powerful alternative.

In this work, we relax the multivariate normality assumption in functional mapping and propose a robust multivariate -distribution for the error terms. The proposed method is implemented in a mapping framework that is different from Peng and Siegmund's treatment [9]. A mixture multivariate -distribution is proposed and an expectation-maximization (EM) algorithm is derived to estimate various parameters of interests. To make the method more flexible for any developmental traits, a non-parametric B-spline technique is incorporated to model the developmental mean function. An antedependence covariance model is applied to model the non-stationary covariance structure [10]. Extensive simulations are conducted to evaluate the model performance. The utility of the method is demonstrated by reanalyzing a real data set for the purpose of identify genes underlying the variation of rice tiller numbers.

Methods

The mixture model and the multivariate likelihood function

Consider a backcross design initiated with two inbreed lines with contrasting phenotypic difference. A genetic linkage map can be constructed with molecular markers. Suppose there is a putative segregating QTL, with alleles and , that affects the trait of interest, but by different degrees. For a backcross population with observations, each one is measured over time points. The phenotypic vector follows a multivariate distribution with a density function , where and denote location and scale parameters.

In a QTL mapping study, the location and QTL genotype are generally unobservable. Suppose the QTL genotypes contributing to the variation of a dynamic quantitative trait are and . This missing data problem can be overcome by modeling the observed phenotypic data with a finite mixture modelwhere is the probability density function with the location parameters corresponding to QTL genotype ( for and for ); contains the scale parameters common to all components; and is the mixture proportion of individual given the QTL genotype . For a backcross design, the mixture proportions can be obtained via the conditional probabilities of QTL genotypes given the flanking marker in a standard backcross design [11].

As we mentioned in section Introduction, multivariate normality is a general concern in functional mapping when extreme observations or heavy tails are observed. To make the functional mapping more flexible, we assume the multivariate distribution for . The multivariate density function for individual given genotype is given by(1)where for genotype ( = 0, 1), denotes the mean vector, is a positive definite covariance matrix, is the degree of freedom, and contains all the parameter of interest corresponding to genotype . The Mahalanobis distance between and with respect to is denoted as

At a specific time point , the relationship between the observation and the mean can be expressed by a linear model(2)where or 1 if the QTL genotype is or , respectively; and is the error term following a distribution with mean zero and variance . The errors at two different time points and , are correlated with correlation coefficient .

Assuming independence among individuals, the joint likelihood function can be expressed as(3)where , and +. The unknown parameter vector consists of two sets of parameters. One set, denoted as , determines the locations of the QTL with respect to markers; and the other set, denoted as , determines the multivariate distribution of the trait corresponding to each QTL genotype, where , and define the mean vectors, the covariance matrices and the degree of freedom.

Modeling the dynamic mean function

One of the challenges in functional mapping lies in the complexity of the developmental pattern as well as the intra-individual variation of a longitudinal trait. Rather than estimating the discrete means at time points, functional mapping treats a developmental trait as a dynamic process which is fitted by a continuous function [7]. For a typical growth trait, a parametric logistic function would fit most data well [12] and it has been broadly applied in many applications (e.g., [5], [13]). For other developmental characteristics such as a process that experiences programmed cell death, it is infeasible to find a mathematic function to describe the process, thus a joint modeling approach may be an option (e.g., [14]). Legendre polynomials have been shown to be useful in modeling irregular developmental processes (e.g., [15], [16]). With recent statistical advances in nonparametric regression, a natural and flexible way to model an irregular developmental process is in a nonparametric fashion in which the data specify the best fit [17].

Here we adopt a nonparametric B-spline technique to model the time-dependent mean function. As aforementioned, the phenotype values are recorded at time points, denoted as . At a particular time point , we can fit the dynamic genotypic means corresponding to the QTL genotypes and by using B-spline functions with different orders. Denote the B-spline basis function in a matrix as B which can be defined by the degree and the order of a piecewise polynomial. For the uniform quadratic B-spline with th order, the basis matrix is expressed asA column vector of the basis matrix is called a base function. For the two QTL genotypes and (corresponding to and 0 respectively), the base genotypic vector is expressed as . The vector contains the coefficients to be estimated for genotype . The B-spline function depends on the observed time points, the number and the relative positions of the knots. The criteria to determine the knots are open to discussion [17]. For the real data analyzed in this study, equidistantly distributed inner knots are selected since the rice tiller numbers are observed with the same duration (10 days). Around inner knots should be selected, as suggested in Yang et al [17]. We choose 3 evenly distributed knots and with this representation, the dynamic genotypic mean at time , , can be estimated by . It is shown later on simulation study that the estimation on the mean curves is satisfactory. This serves as a credential for our choice. Further investigation also indicates that the estimation are not sensitive to various spline bases.

Modeling the covariance function

Though nonparametric modeling of the time-dependent mean functions has been extensively studied, research on the modeling of the covariance structures via non-parametric approaches is rarely reported due to various difficulties [18]. In the original functional mapping [5], a stationary covariance function such as the first-order autoregressive (AR(1)) model was applied. Structured antedependence (SAD) model was later on adopted in functional mapping [19] for the purpose of relaxing the stationarity assumption. The SAD model is a non-stationary model which has been applied in many studies [20]. The SAD model with order for modeling the error term in Eq. (2) is denoted by(4)where is the “innovation” term assumed to be independent and distributed as ; and () are the antedependence coefficients. Therefore, the variance-covariance matrix of the a developmental process can be expressed as(5)where is a diagonal matrix. For the first-order SAD or SAD(1) model, the matrix Q can be expressed asIn general, the SAD order can be selected through an information criterion (see [19]). Since the purpose of this study is not to compare the performance of various modeling approaches for the covariance structure, we simply adopt the SAD(1) function due to its non-stationarity property and simplicity.

Parameter estimation

The Expectation-Maximization (EM) algorithm, originally proposed by Dempster et al. [21], was applied to obtain the maximum likelihood estimates (MLEs) of the unknown parameters contained in . The detailed algorithm is given in the Appendix S1. Note that the QTL position is generally considered as an unknown parameter which can be estimated together with other mean and variance parameters. This, however, could dramatically increase the complexity of an estimation algorithm. As commonly treated in QTL mapping studies, we do not directly estimate the QTL-segregating parameters. Instead, we use a grid search approach to estimate the QTL location by searching for a putative QTL at every 1 or 2 on an interval bracketed by two flanking markers. This linkage scan is done for the entire linkage map. The log-likelihood ratio test statistic for a QTL at a testing position is displayed graphically, to generate a log-likelihood ratio plot called the LR profile plot. The genomic position corresponding to a peak of the profile is the MLE of the QTL location.

Hypothesis testing

Once the MLEs of parameters are obtained at each testing position, we are interested in testing whether there exists a QTL at a marker interval that governs the developmental process. The hypotheses for such a test can be formulated by(6)The null hypothesis states that the data can be fitted by only one curve in the reduced model, while the alternative hypothesis states that there exist two different curves to fit the data in the full model. The likelihood ratio test (LRT) has been the standard test in testing the QTL effect. Denote and as the MLEs of the unknown poarameters under and , respectively. The LRT test statistic can be computed as the log-likelihood ratio of the reduced model to the full model, i.e., . The genome-wide significance threshold can be determined through an empirical approach based on permutation tests proposed by Churchill and Doerge [22].

Following the overall genetic test described above, we can further test if a QTL triggers an effect on a certain time interval using a regional test approach based on the areas under the curve (AUC). The hypothesis for such a test can be formulated as(7)where AUC for genotype is calculated as . The significance of the test can be assessed through permutation tests [22].

Results

Simulation

We simulated a backcross population with a 100 long linkage group, composed of 6 equidistant markers, under the assumption that QTL governs the whole developmental process. A putative QTL that affects a developmental process was assumed to be located 48 away from the first marker on the linkage group, in between the 3 and 4 markers. The Haldane map function was used to convert the map distance into the recombination fraction. A developmental trait with 9 equally spaced time points was generated under various combinations of heritability levels ( = 0.1 vs 0.4) and sample sizes ( = 100 vs 400). The covariance was simulated assuming a first-order SAD structure.

In the simulation, we evaluated how well the parameters (including the QTL position as well as the mean and covariance parameters) can be estimated, how robust the multivariate statistic is when data generate from a multivariate normal, and how poor the performance of multivariate mixture normal will be if the model is misspecified. Several simulation scenarios were considered. Tables 1 and 2 list the results assuming that the data generating and data analyzing models were the same. Tables 3 and 4 list the results assuming the data generating and analyzing models were not the same. In all simulation scenarios, we observed that increases in sample size and heritability always lead to more accurate parameter estimations. For example, in Table 1, the standard error for the mean parameter of genotype reduces from 0.14 to 0.06 while the sample size increases from 100 to 400 under a heritability level of 0.1. Meanwhile, given a sample size 400, the standard error decreases from 0.06 to 0.03 as H increases from 0.1 to 0.4, a two-fold decrease.

thumbnail
Table 1. The MLEs and standard errors (in the parenthesis) of the model parameters and the QTL position derived from 100 simulation replicates.

https://doi.org/10.1371/journal.pone.0024902.t001

thumbnail
Table 2. The MLEs and standard errors (in the parenthesis) of the model parameters and the QTL position derived from 100 simulation replicates.

https://doi.org/10.1371/journal.pone.0024902.t002

thumbnail
Table 3. The MLEs and standard errors (in the parenthesis) of the model parameters and the QTL position derived from 100 simulation replicates.

https://doi.org/10.1371/journal.pone.0024902.t003

thumbnail
Table 4. The MLEs and standard errors (in the parenthesis) of the model parameters and the QTL position derived from 100 simulation replicates.

https://doi.org/10.1371/journal.pone.0024902.t004

For a multivariate distribution, the degree of freedom () controls the shape of the distribution. A small value for indicates that the normal assumption might be inappropriate for the data. Assuming , we simulated data assuming a multivariate distribution. Table 1 (denoted as MVTT) shows that the parameters can be reasonably estimated with good precision. When both sample size and heritability level increase, the precision for the QTL position estimation is improved with reduced standard error. The same simulated data were further analyzed assuming a multivariate normal distribution for the error term. The results are tabulated in Table 3 (denoted as MVTN). It is clear that when the error distribution is misspecified, large standard errors were observed for all the parameters. In particular, the QTL position is poorly estimated with large standard errors under a small sample size and low heritability level. For example, the standard error increases from 2.94 to 5.17 under and , when data were analyzed with the proposed and the multivariate normal model. Under a small sample size, the multivariate distribution is more robust than a multivariate normal.

Next we simulated data under the multivariate normal assumption and analyzed the data with the corresponding data generating model (denoted as MVNN) and the proposed distribution model (denoted as MVNT). We used the results in Table 2 as a reference to compare the performance of the multivariate model in Table 4, since the results in Table 2 was obtained with the true model. Under a small sample size () and low heritability level (), not surprisingly the results with the multivariate model are better for the multivariate normal model. For example, the standard error for the QTL position estimate is 4.35 in MVNT, while it is 5.41 in MVNN. Moreover, the bias in MVNN is also larger (1.72 vs 0.3). This result demonstrates the robustness of the modeling under small samples. As sample size and heritability level increase, the results are very comparable. In real applications, due to various source of noise and for better estimation of the QTL position, a safe strategy is to apply the mixture multivariate model in functional mapping.

In functional mapping, the likelihood ratio (LR) statistic is used as the indicator of a QTL signal. The larger the LR value at a genomic position, the stronger the evidence of a QTL at that position. The LR test statistics for the above four scenarios are also compared across the simulated genetic linkage group, averaged over 100 simulation replicates. Figure 1 explicitly displays the difference in LR values under the different combinations of sample size and heritability level. When data were generated assuming a multivariate normal distribution, the results obtained with the model (dashed curve) are very similar to those obtained with the normal model (solid curve). However, when data were generated assuming the multivariate distribution, the model (dotted curve) clearly outperforms the normal model (dash-dotted curve). This evidence indicates the superiority and robustness of the multivariate mixture model in functional mapping.

thumbnail
Figure 1. The LR profile plots averaged over 100 simulation replicates under different sample sizes (100 and 400) and heritability levels (0.1 and 0.4).

The arrow sign indicates the simulated true QTL position.

https://doi.org/10.1371/journal.pone.0024902.g001

A case study

We applied the method to a real data set to identify QTLs governing the variation of rice tiller number development to show the utility of the approach. A detailed description of the data can be found in Huang et al. [23] and Yan et al. [24]. In brief, semi-dwarf IR64 and tall Azucena, two inbred lines, were crossed to generate an F progeny population. A doubled haploid (DH) population of 123 lines was constructed through doubling haploid chromosomes of the F gametes. For this population, 40 isozyme and RAPD markers, and 135 RFLP markers were genotyped to construct a genetic linkage map of length 2005 covering 12 rice chromosomes. Tiller numbers were measured every 10 days from 10 days after transplanting until all lines had headed. Nine developmental measurements were recorded for each rice. A plot of the original data can be found in Fig. 2 of Cui et al. [14].

thumbnail
Figure 2. The LR profile plot across the 12 rice chromosomes, fitted with the proposed multivariate

mixture model (solid curve) and a multivariate normal mixture model (dash-dotted curve). The genomic position corresponding to the peak of the curve is the MLE of the QTL location (indicated by the arrows). The 5% genome-wide threshold value for claiming the existence of a QTL is given as the horizonal dotted and dash-dotted lines for the two models. The marker positions on the linkage groups are indicated as ticks [23].

https://doi.org/10.1371/journal.pone.0024902.g002

We performed a genome-wide linkage scan at every 2 interval to locate potential QTLs that trigger effects for the programmed cell death of rice tillers. Figure 2 shows the genom-wide log-likelihood ratio profile plots, where the results obtained with the multivariate and the multivariate normal models are indicated by the solid and dashed curves, respectively, with the respective 5% genom-wide permutation threshold indicated by the horizontal solid and dashed lines (obtained with 1,000 permutations). The plot indicates one QTL located in chromosome 3 between marker and . The QTL was also reported in our previous analysis [14], [15]. The other peaks did not pass the genome-wide significance threshold. A test of multivariate normality for the phenotype data without considering the marker data shows evidence of departure from normality, indicating that a multivariate model may be more appropriate for the data. The LR values for the two models across the 12 chromosomes are very comparable, with the multivariate model generating slightly higher LR values in many positions.

The estimated QTL position on chromosome 3 and the corresponding marker interval as well as the MLEs of the model parameters are tabulated in Table 5. The tiller number developmental trajectories of the detected QTL are shown in Figure 3, with tiller number trajectories for all individuals indicated in the background. The gap between the two trajectories over the developmental stages is quite clear, indicating a developmental mean difference in tiller number between individuals carrying the two different genotypes. Individuals carrying genotype have high mean tiller numbers during the observed developmental stage, hence are preferable for selection in breeding.

thumbnail
Figure 3. Two dynamic variation curves of tiller numbers corresponding to the two genotypes,

and . All tiller number trajectories under study are shown in grey background.

https://doi.org/10.1371/journal.pone.0024902.g003

thumbnail
Table 5. The QTL location and MLEs of the estimated parameters with the SAD(1) covariance structure.

https://doi.org/10.1371/journal.pone.0024902.t005

Discussion

Functional mapping has been shown to be a powerful approach and also a standard means in mapping QTLs underlying the dynamics of quantitative traits [7]. However, most current methods in functional mapping assume a multivariate normal distribution for the time-course error term, which could be easily violated in reality. In this work, we extended the current functional mapping approach assuming a robust multivariate distribution for the error term, built upon the maximum likelihood framework while implemented with a full EM algorithm to estimate the model parameters. Extensive simulations show that the proposed model outperforms the mixture multivariate normal model when the underlying distribution is from a multivariate distribution. Even if the underlying distribution is normal, the proposed modeling approach performs as well or even better than the normal model (especially under a small sample size). Given its robustness, the proposed model should be adopted in a regular functional mapping study, especially when the sample size is small.

In the original functional mapping study, a developmental mean process is generally modeled with a mathematical function such as the logistic function for a growth trait [5]. In this study, we modeled the developmental mean process using a nonparametric spline technique, given its flexibility in modeling patterns of data distribution which does not follow any particular mathematical form (e.g., [17], [25]). The correlation structure was modeled by the non-stationary SAD model, which was studied in Zhao et al. [19] for functional mapping. Since the focus of this work is not on the modeling of the mean and the correlation structure, we simply adopted these approaches and did not compare the impact of different modeling approaches on the power of QTL identification. This investigation will be considered in our future work.

In real data analysis, there is not much significant deviation between the LR profile plot of the mixture and the normal model. This is due to the fact that the data distribution is quite close to the multivariate normal. The same data were analyzed before with different models to approximate the developmental mean process [14], [15]. The QTL showing genome-wide significance in this study is consistent with the one found in our previous work, while some other QTLs in chromosome 1 reported in Cui et al. [15] did not pass genome-wide significance in this analysis. This is largely due to differences in the modeling of the mean process. As previous investigation shown, the power and precision in QTL identification are quite sensitive to the way the mean and covariance structures are modeled [14], [15], [19]. In reality, the true mean and covariance function are generally unknown. This raises a very practical issue in functional mapping. What we can do to improve mapping power and precision is by modeling the error distribution with more robust approaches such as the one proposed in this work. We expect that the method developed can enhance the full power of functional mapping in understanding the genetic architecture of dynamic traits.

Supporting Information

Acknowledgments

We wish to thank Prof. J. Stapleton for his careful reading and comments on the manuscript and Dr. X. Tang and Q. Song for helpful discussions.

Author Contributions

Conceived and designed the experiments: YC. Performed the experiments: CW. Analyzed the data: CW. Contributed reagents/materials/analysis tools: GL JZ. Wrote the paper: CW YC.

References

  1. 1. Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.
  2. 2. Jiang C, Zeng Z-B (1995) Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics 140: 1111–1127.
  3. 3. Pletcher SD, Geyer CJ (1999) The genetic analysis of age-dependent traits: Modeling the character process. Genetics 153: 825–835.
  4. 4. Kirkpatrick M, Heckman N (1989) A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters. J Math Biol 27: 429–450.
  5. 5. Ma C-X, Casella G, Wu R (2002) Functional mapping of quantitative trait loci underlying the character process: A theoretical framework. Genetics 161: 1751–1762.
  6. 6. Wu RL, Ma C-X, Lin M, Casella G (2004) A general framework for analyzing the genetic architecture of developmental characteristics. Genetics 166: 1541–1551.
  7. 7. Wu RL, Lin M (2006) Functional mapping – How to map and study the genetic architecture of dynamic complex traits. Nat Rev Genet 7: 229–237.
  8. 8. von Rohr P, Hoeschele I (2002) Bayesian QTL mapping using skewed Student-t distributions. Genet Sel Evol 34: 1–21.
  9. 9. Peng J, Siegmund D (2006) Mapping quantitative trait loci under the multivariate-t model. Unpublished manuscript.
  10. 10. Gabriel KR (1962) Ante-dependence analysis of an ordered set of variables. Ann Math Statist 33: 201–212.
  11. 11. Wu RL, Ma C-X, Casella G (2007) Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL. Springer-Verlag, New York.
  12. 12. West GB, Brown JH, Enquist BJ (2001) A general model for ontogenetic growth. Nature 413: 628–631.
  13. 13. Cui Y, Li S, Li G (2008) Functional mapping imprinted quantitative trait loci underlying developmental characteristics. Theo Biol Med Mod 6: 5.
  14. 14. Cui Y, Zhu J, Wu R (2006) Functional mapping for genetic control of programmed cell death. Physiol Genomics 25: 458–469.
  15. 15. Cui Y, Wu R, Casella G, Zhu J (2008) Nonparametric functional mapping quantitative trait loci underlying programmed cell death. Stat Appl Genet Mol Biol Vol. 7: Iss. 1, Article 4:
  16. 16. Lin M, Wu R (2006) A joint model for nonparametric functional mapping of longitudinal trajectory and time-to-event. BMC Bioinformatics 7: 138.
  17. 17. Yang J, Wu RL, Casella G (2009) Nonparametric Functional Mapping of Quantitative Trait Loci Underlying the Character Process. Biometrics 65: 30–39.
  18. 18. Yap JS, Fan J, Wu R (2009) Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci. Biometrics 65: 1068–1077.
  19. 19. Zhao W, Chen YQ, Casella G, Cheverud JM, Wu RL (2005) A non-stationary model for functional mapping of complex traits. Bioinformatics 21: 2469–2477.
  20. 20. Jaffrézic F, Thompson R, Hill WG (2003) Structural antedependence models for genetic analysis of repeated measures on multiple quantitative traits. Genet Research 82: 55–65.
  21. 21. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39: 1–38.
  22. 22. Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138: 963–971.
  23. 23. Huang N, Parco A, Mew T, Magpantay G, McCouch S, et al. (1997) RFLP mapping of isozymes, RAPD and QTL for grain shape, brown planthopper resistance in a doubled haploid rice population. Mol Breeding 3: 105–113.
  24. 24. Yan JQ, Zhu J, He CX, Benmoussa M, Wu P (1998) Quantitative trait loci analysis for the developmental behavior of tiller number in rice. Theor Appl Genet 97: 267–274.
  25. 25. Wu S, Yang J, Wu R (2007) Semiparametric functional mapping of quantitative trait loci governing long-term HIV dynamics. Bioinformatics 23: 569–576.
  26. 26. Liu C, Rubin DB (1995) ML estimation of the t distribution using EM and its extensions, ECM and ECME. Statistica Sinica 5: 19–39.
  27. 27. Shoham S (2002) Robust clustering by deterministic agglomeration EM of mixtures of multivariate t distribution. Pattern Recognition 35: 1127–1142.