Homogeneity Test for Correlated Binary Data

Changxing Ma; Guogen Shan; Song Liu

doi:10.1371/journal.pone.0124337

Abstract

In ophthalmologic studies, measurements obtained from both eyes of an individual are often highly correlated. Ignoring the correlation could lead to incorrect inferences. An asymptotic method was proposed by Tang and others (2008) for testing equality of proportions between two groups under Rosner's model. In this article, we investigate three testing procedures for general g ≥ 2 groups. Our simulation results show the score testing procedure usually produces satisfactory type I error control and has reasonable power. The three test procedures get closer when sample size becomes larger. Examples from ophthalmologic studies are used to illustrate our proposed methods.

Citation: Ma C, Shan G, Liu S (2015) Homogeneity Test for Correlated Binary Data. PLoS ONE 10(4): e0124337. https://doi.org/10.1371/journal.pone.0124337

Academic Editor: Song Wu, The State University of New York at Stony Brook, UNITED STATES

Received: January 2, 2015; Accepted: February 25, 2015; Published: April 21, 2015

Copyright: © 2015 Ma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: All relevant data are within the paper.

Funding: The authors have no support or funding to report.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In randomized clinical trials [2], patients are usually randomized into two or more treatment groups, and patients within each group receive the same treatment. Often a control group or a group with standard treatment is included for testing the efficiency of new treatments. After the randomization, all patients are followed up in exactly the same way as designed, and the only difference is the treatment assigned to each group. A randomized clinical trial is a good choice to eliminate many of the biases and to avoid ethical problems that may arise from comparing treatments [3] [4]. For example, in a double-blinded two-arm clinical trial for an ophthalmologic study, all patients are randomized into two treatment groups and the same treatment is applied to both eyes of patients from the same group. Such clustered data with a cluster size of two often arises from statistical and medical applications, for example, ophthalmologic studies, orthopaedic studies, otolaryngological studies and twin studies.

We wish to test if the outcomes are identical among the two or more treatment groups. Obviously, the information collected from two eyes of a single person tends to be highly correlated. Any statistical methods that ignore the feature of dependence, such as t tests, analyses of variance, or chi-square tests, could lead to incorrect inferences (see, [5] [6] [7] [8] [9]).

In this article, we consider the case of a dichotomous outcome, such as the presence of a disease or some other binary trait. Several statistical tests have been proposed. Rosner [5] proposed a parametric model and a test statistic for testing homogeneity of proportions among g groups. However, the maximum likelihood estimates (MLEs) and likelihood-based tests were not given. [1] [10] considered this problem for two groups only and proposed several asymptotic testing procedures, including the score test. It is difficult to extend the testing procedures from 2 groups to g groups (g > 2), due to the complexity of deriving the information matrix and maximum likelihood estimates which can be obtained only by numerical iterations. The score test statistic has demonstrated better type I error control and power than other testing procedures, when comparing two treatment groups [1]. We expect the score test, investigated for comparing multiple treatment groups in this article, to perform well as compared to other procedures.

In this article, we present the methods for comparing proportions among any g groups, with g ≥ 2. The maximum likelihood estimate under Rosner’s model and three different methods (Likelihood Ratio test, Wald-type test, Score test) are derived and investigated in Section 2. In Section 3, Monte Carlo simulation studies are conducted to compare the performance of various tests and comparisons are evaluated with respect to empirical type I error rates and powers. Examples from otolaryngological studies are illustrated to demonstrate our methodologies in Section 4. Finally, we give some concluding remarks in Section 5.

Methods

Suppose we wish to compare g groups of individuals from an ophthalmologic study with m_i individuals in the ith group, i = 1, …, g; N = ∑m_i total subjects (Table 1). Let Z_ijk = 1 if the kth eye of jth individual in the ith group has a response at the end of the study, and 0 otherwise, i = 1, …, g, j = 1, …, m_i, k = 1, 2. Let m_li denote the number of subjects who has exactly l responses in the ith group, and S_l be the number of subjects who has exactly l responses (e.g., affected eyes)

Download:

Table 1. Frequencies of the number of affected eyes for persons in g groups.

https://doi.org/10.1371/journal.pone.0124337.t001

A parametric model proposed by [5] is given as (1) i = 1, …, g, j = 0, …, m_i, k = 1, 2 for some positive R. The constant R is a measure of dependence between two eyes of the same person. If R = 1, the two eyes from the same patient are completely independent. If Rπ_i = 1, the eyes of each patient in the i-th group are completely dependent. The R satisfies 0 < R ≤ 1/a, if a ≤ 1/2; (2 − 1/a)/a ≤ R ≤ 1/a, if a > 1/2; where a = max{π_i, i = 1, …, g}. From the conditional probability in Eq (1), it is easy to show that the correlation between two eyes is (2)

We wish to test whether the response rates of the g groups are identical. The hypotheses are given as against

Based on the observed data $\tilde{M} = (m_{01}, \dots, m_{0 g}, m_{11}, \dots, m_{1 g}, m_{21}, \dots, m_{2 g})$ , the corresponding log-likelihood can be expressed as Differentiating l(π₁, …, π_g; R) with respect to parameters π₁, …, π_g and R yields (3) (4)

Under the null hypothesis H₀ : π₁ = ⋯ = π_g = π, the maximum likelihood estimates of π and R satisfy A direct algebra calculation results in the MLEs of $π_{i}^{'} s$ and R and

Denote ${\hat{π}}_{i}, i = 1, \dots, g$ and $\hat{R}$ as the maximum likelihood estimate of π_i, i = 1, …, g and R, respectively. ${\hat{π}}_{i}, i = 1, \dots, g$ and $\hat{R}$ are the solution of the following equations There is no closed form solution and it has to be solved iteratively. We can simplify the formula in Eq (3) as the following 3rd order polynomial (for i = 1, …, g) The (t + 1)th update for π_i can be directly obtained by the real root of the above equation, and R can be updated by the Fisher scoring method See next section for the formula of $\frac{\partial^{2} l}{\partial R^{2}}$ .

Information matrix

Differentiating $\frac{\partial l}{\partial π_{i}}, i = 1, \dots, g$ and $\frac{\partial l}{\partial R}$ with respect to π_i, i = 1, …, g and R respectively yields Then we have The (g + 1) × (g + 1) information matrix is denoted as I(π₁, …, π_g; R) = (I_ij).

Under the null hypothesis H₀ : π₁ = ⋯ = π_g = π, it is straightforward but tedious to show that the inverse of the information matrix can be expressed as (5) where

With the MLEs and information matrix derived, we consider the following test statistics.

Likelihood ratio test (T_LR)

The likelihood ratio (LR) test is given by Under the null hypothesis, T_LR is asymptotically distributed as a chi-square distribution with g − 1 degrees of freedom.

Wald-type test (T_W)

The null hypothesis H₀ : π₁ = ⋯ = π_g can be alternatively expressed as C β^T = 0 where β = (π₁, ⋯, π_g, R) and Wald-type test statistic (T_W) for testing H₀ can be expressed as where I is the Fisher information matrix for and T_W is asymptotically distributed as a chi-square distribution with g − 1 degrees of freedom. T_W can be simplified as where

Other multivariate tests of π’s can be similarly done by choosing the corresponding C matrix in the above statistic. Further, a Wald-type test statistic for testing H_0a : π_i = π_j vs H_1a : π_i ≠ π_j, i ≠ j can be given by where c = (0, …, 1, …, −1, …, 0) with 1 in the ith element and −1 in the jth element. T_Wa is asymptotically distributed as a chi-square distribution with 1 degree of freedom. T_Wa(i, j) can be simplified as

Score test (T_SC)

The score test statistic T_SC is given by where and see (Eq 5) for the formula of the inverse of the information matrix I(π, R)⁻¹.

It can be simplified as (6) after lengthy algebra calculations. $T_{S C}^{2}$ is asymptotically distributed as a chi-square distribution with g − 1 degree of freedom.

Remark 1. One limitation of the score statistic is that it cannot be computed if S₀ = 0 or S₁ = 0. We dealt with this problem by adding 1/(2g) to m_ij for such situations.

Remark 2. [1] derived a score test T_SC for g = 2 as which is equivalent to (Eq 6).

Monte Carlo simulation studies

We now investigate the performance of proposed statistics and testing procedures discussed in the previous section. First, we investigate the behavior of the type I error rates of various procedures for g = 2,3,4,5; sample size m₁ = ⋯ = m_g = 20, 40, 60, 80 and 100; π₁ = ⋯ = π_g = π₀ = 0.5(0.1)0.8; and R = 1 + ρ(1 − π₀)/π₀ where ρ = 0.4(0.1)0.6. An imbalanced sample setting is also considered. In each configuration, 50,000 replications are generated based on the null hypothesis, and empirical type I error rates are computed as the number of rejections/50000. The results are presented in Table 2. Following [1], we calculated the ratio of empirical type I error rate to the nominal type I error rate. A test is said to be liberal if the ratio is greater than 1.2 (i.e., > 6% for α = 5%), conservative if the ratio is less than 0.8 (i.e., < 4%), and robust if the ratio is between 0.8 and 1.2.

Download:

Table 2. The type I error rates (percent) of various procedures under H₀ : π₁ = ⋯ = π_g = π₀ at α = 0.05 based on 50,000 replicates.

https://doi.org/10.1371/journal.pone.0124337.t002

Generally, score tests $T_{S C}^{2}$ produce satisfactory type I error controls for any configuration while LR tests and Wald tests are liberal, especially for small samples and larger numbers of groups (g). When g > 2, Wald tests are more liberal than LR tests and these tests get closer when sample size becomes larger.

LR tests and Wald-type tests are extremely liberal for a small sample size (i.e., m = 20), and their actual sizes inflate with the increase of the correlation coefficient (i.e., ρ).

Next, we evaluate the power performance of proposed methods. We consider the alternative hypotheses with H₁ : π = (0.25, 0.4), (0.25, 0.325, 0,4), and (0.25, 0.3, 0.35, 0.4) for g = 2, 3, and 4, respectively. R is chosen as 1, 1.5, and 2.0 and sample size m₁ = ⋯ = m_g = 20, 40, 60, 80 and 100. Ronser’s statistic T is also considered in the simulation studies and the results are presented in Table 3.

Download:

Table 3. The power (percent) of various procedures at α = 0.05 based on 50,000 replicates.

https://doi.org/10.1371/journal.pone.0124337.t003

Based on the simulation results, LR and Wald tests are generally more powerful than score tests and Ronser’s T generally has less power. However, power of LR and Wald tests is often overestimated in small sample size due to the inflated type I error rates from these tests being observed (see Table 2). For moderate or large sample sizes, the powers of the three proposed methods are close. Overall, the score test is highly recommended as it has reasonable power with satisfactory type I error control.

Examples

We first reanalyze the data presented by [5] to illustrate the newly proposed methods. The data consists of 218 persons aged 20–39 with retinitis pigmentosa (RP) who were seen at Massachusetts Eye and Ear Infirmary. They were classified into four genetic types, namely, autosomal dominant RP (DOM), autosomal recessive RP (AR), sex-linked RP (SL), and isolate RP (ISO). The differences between these four groups on the Snellen visual acuity (VA) were assessed. An eye was considered affected if VA was 20/50 or worse, and normal if VA was 20/40 or better. The sample used for this analysis consists of 216 persons out of the sample of 218 persons each of whom had complete information for VA on both eyes (Table 4).

Download:

Table 4. Distribution of the number of affected eyes for persons in each genetic type [5].

https://doi.org/10.1371/journal.pone.0124337.t004

An overall significant difference between the proportions of affected eyes in the four groups is from 0.0769 to 0.1173 based on proposed methods and 0.010 on Rosner’s statistic T (Table 5).

Download:

Table 5. Statistic and p-value for comparing VA for different genetic types of RP.

https://doi.org/10.1371/journal.pone.0124337.t005

The maximum likelihood estimates and pairwise comparisons based on Wald-type test T_Wa(i, j) are shown in Table 6. It shows a significant difference between DOM and AR (p = 0.0207).

Download:

Table 6. Wald-type test results comparing VA for different genetic types of RP.

https://doi.org/10.1371/journal.pone.0124337.t006

Another example was a recent study from a cross-sectional, population-based sample in Iran assessing the prevalence of avoidable blindness [11]. Nearly 3000 persons were examined and blindness was assessed for seven age groups (Table 7). Test statistics $T_{L R}^{2} = 134.7$ , $T_{W}^{2} = 89.1$ , $T_{S C}^{2} = 161.1$ , and T = 202.0 consistently show the significant age differences (p-values < 0.0001), and MLE $\hat{R} = 3.35$ ( ${\hat{ρ}}_{i} = \frac{{\hat{π}}_{i}}{1 - {\hat{π}}_{i}} (\hat{R} - 1)$ are from 0.05 to 0.59) shows positive correlations between eyes for the same person.

Download:

Table 7. Prevalence of avoidable blindness from a sample population in Iran [11].

https://doi.org/10.1371/journal.pone.0124337.t007

Concluding remarks

In this article, we investigated three procedures for testing the homogeneity of correlated data with a cluster size of two. We derived the maximum likelihood estimate algorithm by utilizing the root of third order polynomial equations. The Fisher scoring method is usually criticized for converging slowly, especially when the number of parameters is large (e.g. g is large). However, the algorithm derived in this paper is very efficient. This is because only R is updated by Fisher scoring iterations, and π_i, i = 1, …, g are the roots of third order polynomials, a closed form solution.

Simulation results showed that the proposed approach (score test) has satisfactory type I error control and reasonable power, regardless of number of groups, sample size, or parameter configurations. On the other hand, the LR test and Wald test have inflated type I error in small sample size. When sample size becomes larger, the three test procedures get closer.

For binary correlated data, there are alternative ways to solve the MLE iteratively or perform hypothesis testing by model-based methods, e.g., GENMOD or GLIMMIX in SAS. However, neither iterative version of test statistics nor model-based method provides the explicit form of the test statistic. The explicit form of the test statistic in our proposed method is useful not only for its simplicity, but also for further development of the exact test. For example, in small sample situation, an exact test may overcome the inflated type I error rate and thus is highly desirable. To perform exact test, it will requires extensive calculations which makes it nearly impossible using iterative versions of test statistics or model-based methods.

To overcome inflated type I error control in asymptotic tests, [10] and [12] considered exact tests for g = 2. We consider the exact tests for g > 2 as interesting future work.

A user-friendly web-based calculator, including model estimations, hypothesis testings, and simulations, is available from the corresponding author upon request.

Acknowledgments

The authors thank the associate editor and three referees for constructive comments and suggestions which improve the presentation of the paper.

Author Contributions

Conceived and designed the experiments: CM GS. Performed the experiments: CM. Analyzed the data: CM. Contributed reagents/materials/analysis tools: CM GS. Wrote the paper: CM GS SL.

References

1. Tang NS, Tang ML, Qiu SF. Testing the equality of proportions for correlated otolaryngologic data. Computational Statistics & Data Analysis, 52(7):3719–3729, March 2008.
- View Article
- Google Scholar
2. Simon R, Wittes RE, Ellenberg SS. Randomized phase II clinical trials. Cancer treatment reports, 69(12):1375–1381, December 1985. pmid:4075313
3. Wilding GE, Shan G, Hutson AD. Exact two-stage designs for phase II activity trials with rank-based endpoints. Contemporary Clinical Trials, 2011.
4. Donner A. Cluster randomization trials in epidemiology: theory and application. Journal of Statistical Planning and Inference, 42(1–2):37–56, November 1994.
- View Article
- Google Scholar
5. Rosner B. Statistical methods in ophthalmology: an adjustment for the intraclass correlation between eyes. Biometrics, 38(1):105–114, March 1982. pmid:7082754
6. Dallal GE. Paired Bernoulli trials. Biometrics 44, 253–257, 1988. pmid:3358992
7. Rosner B, Milton RC. Significance testing for correlated binary outcome data. Biometrics 44, 505–512, 1988. pmid:3390508
8. Donner A. Statistical methods in ophthalmology: An adjusted chi-square approach. Biometrics 45, 605–611. pmid:2765640
9. Bodian CA. Intraclass correlation for two-by-two tables under three sampling designs. Biometrics 50, 183–193, 1994. pmid:8086601
10. Tang ML, Tang NS, Rosner B. Statistical inference for correlated data in ophthalmologic studies. Statistics in Medicine 25, 2771–2783, 2006. pmid:16381067
11. Rajavi Z, Katibeh M, Ziaei H, Fardesmaeilpour N, Sehat M, Ahmadieh H, et al. Rapid assessment of avoidable blindness in Iran, Ophthalmology 118, 1812–18, 2011. pmid:21571371
12. Shan G, Ma CX. Exact Methods for Testing the Equality of Proportions for Binary Clustered Data From Otolaryngologic Studies. Statistics in Biopharmaceutical Research, 6, 115–122, 2014.
- View Article
- Google Scholar

[ref1] 1. Tang NS, Tang ML, Qiu SF. Testing the equality of proportions for correlated otolaryngologic data. Computational Statistics & Data Analysis, 52(7):3719–3729, March 2008.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Simon R, Wittes RE, Ellenberg SS. Randomized phase II clinical trials. Cancer treatment reports, 69(12):1375–1381, December 1985. pmid:4075313
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Wilding GE, Shan G, Hutson AD. Exact two-stage designs for phase II activity trials with rank-based endpoints. Contemporary Clinical Trials, 2011.

[ref4] 4. Donner A. Cluster randomization trials in epidemiology: theory and application. Journal of Statistical Planning and Inference, 42(1–2):37–56, November 1994.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Rosner B. Statistical methods in ophthalmology: an adjustment for the intraclass correlation between eyes. Biometrics, 38(1):105–114, March 1982. pmid:7082754
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref6] 6. Dallal GE. Paired Bernoulli trials. Biometrics 44, 253–257, 1988. pmid:3358992
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref7] 7. Rosner B, Milton RC. Significance testing for correlated binary outcome data. Biometrics 44, 505–512, 1988. pmid:3390508
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref8] 8. Donner A. Statistical methods in ophthalmology: An adjusted chi-square approach. Biometrics 45, 605–611. pmid:2765640
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref9] 9. Bodian CA. Intraclass correlation for two-by-two tables under three sampling designs. Biometrics 50, 183–193, 1994. pmid:8086601
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref10] 10. Tang ML, Tang NS, Rosner B. Statistical inference for correlated data in ophthalmologic studies. Statistics in Medicine 25, 2771–2783, 2006. pmid:16381067
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref11] 11. Rajavi Z, Katibeh M, Ziaei H, Fardesmaeilpour N, Sehat M, Ahmadieh H, et al. Rapid assessment of avoidable blindness in Iran, Ophthalmology 118, 1812–18, 2011. pmid:21571371
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref12] 12. Shan G, Ma CX. Exact Methods for Testing the Equality of Proportions for Binary Clustered Data From Otolaryngologic Studies. Statistics in Biopharmaceutical Research, 6, 115–122, 2014.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

Figures

Abstract

Introduction

Methods

Information matrix

Likelihood ratio test (TLR)

Wald-type test (TW)

Score test (TSC)

Monte Carlo simulation studies

Examples

Concluding remarks

Acknowledgments

Author Contributions

References

Likelihood ratio test (T_LR)

Wald-type test (T_W)

Score test (T_SC)