Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Homogeneity Test for Correlated Binary Data

  • Changxing Ma ,

    cxma@buffalo.edu

    Affiliation Department of Biostatistics, University at Buffalo, Buffalo, New York 14214, USA

  • Guogen Shan,

    Affiliation Department of Environmental and Occupational Health, School of Community Health Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA

  • Song Liu

    Affiliation Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, NY 14203, USA

Abstract

In ophthalmologic studies, measurements obtained from both eyes of an individual are often highly correlated. Ignoring the correlation could lead to incorrect inferences. An asymptotic method was proposed by Tang and others (2008) for testing equality of proportions between two groups under Rosner's model. In this article, we investigate three testing procedures for general g ≥ 2 groups. Our simulation results show the score testing procedure usually produces satisfactory type I error control and has reasonable power. The three test procedures get closer when sample size becomes larger. Examples from ophthalmologic studies are used to illustrate our proposed methods.

Introduction

In randomized clinical trials [2], patients are usually randomized into two or more treatment groups, and patients within each group receive the same treatment. Often a control group or a group with standard treatment is included for testing the efficiency of new treatments. After the randomization, all patients are followed up in exactly the same way as designed, and the only difference is the treatment assigned to each group. A randomized clinical trial is a good choice to eliminate many of the biases and to avoid ethical problems that may arise from comparing treatments [3] [4]. For example, in a double-blinded two-arm clinical trial for an ophthalmologic study, all patients are randomized into two treatment groups and the same treatment is applied to both eyes of patients from the same group. Such clustered data with a cluster size of two often arises from statistical and medical applications, for example, ophthalmologic studies, orthopaedic studies, otolaryngological studies and twin studies.

We wish to test if the outcomes are identical among the two or more treatment groups. Obviously, the information collected from two eyes of a single person tends to be highly correlated. Any statistical methods that ignore the feature of dependence, such as t tests, analyses of variance, or chi-square tests, could lead to incorrect inferences (see, [5] [6] [7] [8] [9]).

In this article, we consider the case of a dichotomous outcome, such as the presence of a disease or some other binary trait. Several statistical tests have been proposed. Rosner [5] proposed a parametric model and a test statistic for testing homogeneity of proportions among g groups. However, the maximum likelihood estimates (MLEs) and likelihood-based tests were not given. [1] [10] considered this problem for two groups only and proposed several asymptotic testing procedures, including the score test. It is difficult to extend the testing procedures from 2 groups to g groups (g > 2), due to the complexity of deriving the information matrix and maximum likelihood estimates which can be obtained only by numerical iterations. The score test statistic has demonstrated better type I error control and power than other testing procedures, when comparing two treatment groups [1]. We expect the score test, investigated for comparing multiple treatment groups in this article, to perform well as compared to other procedures.

In this article, we present the methods for comparing proportions among any g groups, with g ≥ 2. The maximum likelihood estimate under Rosner’s model and three different methods (Likelihood Ratio test, Wald-type test, Score test) are derived and investigated in Section 2. In Section 3, Monte Carlo simulation studies are conducted to compare the performance of various tests and comparisons are evaluated with respect to empirical type I error rates and powers. Examples from otolaryngological studies are illustrated to demonstrate our methodologies in Section 4. Finally, we give some concluding remarks in Section 5.

Methods

Suppose we wish to compare g groups of individuals from an ophthalmologic study with mi individuals in the ith group, i = 1, …, g; N = ∑mi total subjects (Table 1). Let Zijk = 1 if the kth eye of jth individual in the ith group has a response at the end of the study, and 0 otherwise, i = 1, …, g, j = 1, …, mi, k = 1, 2. Let mli denote the number of subjects who has exactly l responses in the ith group, and Sl be the number of subjects who has exactly l responses (e.g., affected eyes)

thumbnail
Table 1. Frequencies of the number of affected eyes for persons in g groups.

https://doi.org/10.1371/journal.pone.0124337.t001

A parametric model proposed by [5] is given as (1) i = 1, …, g, j = 0, …, mi, k = 1, 2 for some positive R. The constant R is a measure of dependence between two eyes of the same person. If R = 1, the two eyes from the same patient are completely independent. If i = 1, the eyes of each patient in the i-th group are completely dependent. The R satisfies 0 < R ≤ 1/a, if a ≤ 1/2; (2 − 1/a)/aR ≤ 1/a, if a > 1/2; where a = max{πi, i = 1, …, g}. From the conditional probability in Eq (1), it is easy to show that the correlation between two eyes is (2)

We wish to test whether the response rates of the g groups are identical. The hypotheses are given as against

Based on the observed data M˜=(m01,,m0g,m11,,m1g,m21,,m2g), the corresponding log-likelihood can be expressed as Differentiating l(π1, …, πg; R) with respect to parameters π1, …, πg and R yields (3) (4)

Under the null hypothesis H0 : π1 = ⋯ = πg = π, the maximum likelihood estimates of π and R satisfy A direct algebra calculation results in the MLEs of πis and R and

Denote π^i,i=1,,g and R^ as the maximum likelihood estimate of πi, i = 1, …, g and R, respectively. π^i,i=1,,g and R^ are the solution of the following equations There is no closed form solution and it has to be solved iteratively. We can simplify the formula in Eq (3) as the following 3rd order polynomial (for i = 1, …, g) The (t + 1)th update for πi can be directly obtained by the real root of the above equation, and R can be updated by the Fisher scoring method See next section for the formula of 2lR2.

Information matrix

Differentiating lπi,i=1,,g and lR with respect to πi, i = 1, …, g and R respectively yields Then we have The (g + 1) × (g + 1) information matrix is denoted as I(π1, …, πg; R) = (Iij).

Under the null hypothesis H0 : π1 = ⋯ = πg = π, it is straightforward but tedious to show that the inverse of the information matrix can be expressed as (5) where

With the MLEs and information matrix derived, we consider the following test statistics.

Likelihood ratio test (TLR)

The likelihood ratio (LR) test is given by Under the null hypothesis, TLR is asymptotically distributed as a chi-square distribution with g − 1 degrees of freedom.

Wald-type test (TW)

The null hypothesis H0 : π1 = ⋯ = πg can be alternatively expressed as C βT = 0 where β = (π1, ⋯, πg, R) and Wald-type test statistic (TW) for testing H0 can be expressed as where I is the Fisher information matrix for and TW is asymptotically distributed as a chi-square distribution with g − 1 degrees of freedom. TW can be simplified as where

Other multivariate tests of π’s can be similarly done by choosing the corresponding C matrix in the above statistic. Further, a Wald-type test statistic for testing H0a : πi = πj vs H1a : πiπj, ij can be given by where c = (0, …, 1, …, −1, …, 0) with 1 in the ith element and −1 in the jth element. TWa is asymptotically distributed as a chi-square distribution with 1 degree of freedom. TWa(i, j) can be simplified as

Score test (TSC)

The score test statistic TSC is given by where and see (Eq 5) for the formula of the inverse of the information matrix I(π, R)−1.

It can be simplified as (6) after lengthy algebra calculations. TSC2 is asymptotically distributed as a chi-square distribution with g − 1 degree of freedom.

Remark 1. One limitation of the score statistic is that it cannot be computed if S0 = 0 or S1 = 0. We dealt with this problem by adding 1/(2g) to mij for such situations.

Remark 2. [1] derived a score test TSC for g = 2 as which is equivalent to (Eq 6).

Monte Carlo simulation studies

We now investigate the performance of proposed statistics and testing procedures discussed in the previous section. First, we investigate the behavior of the type I error rates of various procedures for g = 2,3,4,5; sample size m1 = ⋯ = mg = 20, 40, 60, 80 and 100; π1 = ⋯ = πg = π0 = 0.5(0.1)0.8; and R = 1 + ρ(1 − π0)/π0 where ρ = 0.4(0.1)0.6. An imbalanced sample setting is also considered. In each configuration, 50,000 replications are generated based on the null hypothesis, and empirical type I error rates are computed as the number of rejections/50000. The results are presented in Table 2. Following [1], we calculated the ratio of empirical type I error rate to the nominal type I error rate. A test is said to be liberal if the ratio is greater than 1.2 (i.e., > 6% for α = 5%), conservative if the ratio is less than 0.8 (i.e., < 4%), and robust if the ratio is between 0.8 and 1.2.

thumbnail
Table 2. The type I error rates (percent) of various procedures under H0 : π1 = ⋯ = πg = π0 at α = 0.05 based on 50,000 replicates.

https://doi.org/10.1371/journal.pone.0124337.t002

Generally, score tests TSC2 produce satisfactory type I error controls for any configuration while LR tests and Wald tests are liberal, especially for small samples and larger numbers of groups (g). When g > 2, Wald tests are more liberal than LR tests and these tests get closer when sample size becomes larger.

LR tests and Wald-type tests are extremely liberal for a small sample size (i.e., m = 20), and their actual sizes inflate with the increase of the correlation coefficient (i.e., ρ).

Next, we evaluate the power performance of proposed methods. We consider the alternative hypotheses with H1 : π = (0.25, 0.4), (0.25, 0.325, 0,4), and (0.25, 0.3, 0.35, 0.4) for g = 2, 3, and 4, respectively. R is chosen as 1, 1.5, and 2.0 and sample size m1 = ⋯ = mg = 20, 40, 60, 80 and 100. Ronser’s statistic T is also considered in the simulation studies and the results are presented in Table 3.

thumbnail
Table 3. The power (percent) of various procedures at α = 0.05 based on 50,000 replicates.

https://doi.org/10.1371/journal.pone.0124337.t003

Based on the simulation results, LR and Wald tests are generally more powerful than score tests and Ronser’s T generally has less power. However, power of LR and Wald tests is often overestimated in small sample size due to the inflated type I error rates from these tests being observed (see Table 2). For moderate or large sample sizes, the powers of the three proposed methods are close. Overall, the score test is highly recommended as it has reasonable power with satisfactory type I error control.

Examples

We first reanalyze the data presented by [5] to illustrate the newly proposed methods. The data consists of 218 persons aged 20–39 with retinitis pigmentosa (RP) who were seen at Massachusetts Eye and Ear Infirmary. They were classified into four genetic types, namely, autosomal dominant RP (DOM), autosomal recessive RP (AR), sex-linked RP (SL), and isolate RP (ISO). The differences between these four groups on the Snellen visual acuity (VA) were assessed. An eye was considered affected if VA was 20/50 or worse, and normal if VA was 20/40 or better. The sample used for this analysis consists of 216 persons out of the sample of 218 persons each of whom had complete information for VA on both eyes (Table 4).

thumbnail
Table 4. Distribution of the number of affected eyes for persons in each genetic type [5].

https://doi.org/10.1371/journal.pone.0124337.t004

An overall significant difference between the proportions of affected eyes in the four groups is from 0.0769 to 0.1173 based on proposed methods and 0.010 on Rosner’s statistic T (Table 5).

thumbnail
Table 5. Statistic and p-value for comparing VA for different genetic types of RP.

https://doi.org/10.1371/journal.pone.0124337.t005

The maximum likelihood estimates and pairwise comparisons based on Wald-type test TWa(i, j) are shown in Table 6. It shows a significant difference between DOM and AR (p = 0.0207).

thumbnail
Table 6. Wald-type test results comparing VA for different genetic types of RP.

https://doi.org/10.1371/journal.pone.0124337.t006

Another example was a recent study from a cross-sectional, population-based sample in Iran assessing the prevalence of avoidable blindness [11]. Nearly 3000 persons were examined and blindness was assessed for seven age groups (Table 7). Test statistics TLR2=134.7, TW2=89.1, TSC2=161.1, and T = 202.0 consistently show the significant age differences (p-values < 0.0001), and MLE R^=3.35 (ρ^i=π^i1π^i(R^1) are from 0.05 to 0.59) shows positive correlations between eyes for the same person.

thumbnail
Table 7. Prevalence of avoidable blindness from a sample population in Iran [11].

https://doi.org/10.1371/journal.pone.0124337.t007

Concluding remarks

In this article, we investigated three procedures for testing the homogeneity of correlated data with a cluster size of two. We derived the maximum likelihood estimate algorithm by utilizing the root of third order polynomial equations. The Fisher scoring method is usually criticized for converging slowly, especially when the number of parameters is large (e.g. g is large). However, the algorithm derived in this paper is very efficient. This is because only R is updated by Fisher scoring iterations, and πi, i = 1, …, g are the roots of third order polynomials, a closed form solution.

Simulation results showed that the proposed approach (score test) has satisfactory type I error control and reasonable power, regardless of number of groups, sample size, or parameter configurations. On the other hand, the LR test and Wald test have inflated type I error in small sample size. When sample size becomes larger, the three test procedures get closer.

For binary correlated data, there are alternative ways to solve the MLE iteratively or perform hypothesis testing by model-based methods, e.g., GENMOD or GLIMMIX in SAS. However, neither iterative version of test statistics nor model-based method provides the explicit form of the test statistic. The explicit form of the test statistic in our proposed method is useful not only for its simplicity, but also for further development of the exact test. For example, in small sample situation, an exact test may overcome the inflated type I error rate and thus is highly desirable. To perform exact test, it will requires extensive calculations which makes it nearly impossible using iterative versions of test statistics or model-based methods.

To overcome inflated type I error control in asymptotic tests, [10] and [12] considered exact tests for g = 2. We consider the exact tests for g > 2 as interesting future work.

A user-friendly web-based calculator, including model estimations, hypothesis testings, and simulations, is available from the corresponding author upon request.

Acknowledgments

The authors thank the associate editor and three referees for constructive comments and suggestions which improve the presentation of the paper.

Author Contributions

Conceived and designed the experiments: CM GS. Performed the experiments: CM. Analyzed the data: CM. Contributed reagents/materials/analysis tools: CM GS. Wrote the paper: CM GS SL.

References

  1. 1. Tang NS, Tang ML, Qiu SF. Testing the equality of proportions for correlated otolaryngologic data. Computational Statistics & Data Analysis, 52(7):3719–3729, March 2008.
  2. 2. Simon R, Wittes RE, Ellenberg SS. Randomized phase II clinical trials. Cancer treatment reports, 69(12):1375–1381, December 1985. pmid:4075313
  3. 3. Wilding GE, Shan G, Hutson AD. Exact two-stage designs for phase II activity trials with rank-based endpoints. Contemporary Clinical Trials, 2011.
  4. 4. Donner A. Cluster randomization trials in epidemiology: theory and application. Journal of Statistical Planning and Inference, 42(1–2):37–56, November 1994.
  5. 5. Rosner B. Statistical methods in ophthalmology: an adjustment for the intraclass correlation between eyes. Biometrics, 38(1):105–114, March 1982. pmid:7082754
  6. 6. Dallal GE. Paired Bernoulli trials. Biometrics 44, 253–257, 1988. pmid:3358992
  7. 7. Rosner B, Milton RC. Significance testing for correlated binary outcome data. Biometrics 44, 505–512, 1988. pmid:3390508
  8. 8. Donner A. Statistical methods in ophthalmology: An adjusted chi-square approach. Biometrics 45, 605–611. pmid:2765640
  9. 9. Bodian CA. Intraclass correlation for two-by-two tables under three sampling designs. Biometrics 50, 183–193, 1994. pmid:8086601
  10. 10. Tang ML, Tang NS, Rosner B. Statistical inference for correlated data in ophthalmologic studies. Statistics in Medicine 25, 2771–2783, 2006. pmid:16381067
  11. 11. Rajavi Z, Katibeh M, Ziaei H, Fardesmaeilpour N, Sehat M, Ahmadieh H, et al. Rapid assessment of avoidable blindness in Iran, Ophthalmology 118, 1812–18, 2011. pmid:21571371
  12. 12. Shan G, Ma CX. Exact Methods for Testing the Equality of Proportions for Binary Clustered Data From Otolaryngologic Studies. Statistics in Biopharmaceutical Research, 6, 115–122, 2014.