Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Statistical Use in Clinical Studies: Is There Evidence of a Methodological Shift?

  • Dali Yi,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Dihui Ma,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Gaoming Li,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Liang Zhou,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Qin Xiao,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Yanqi Zhang,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Xiaoyu Liu,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Hongru Chen,

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Julia Christine Pettigrew,

    Affiliations University of Washington, School of Arts and Sciences, Department of Biological Sciences, Seattle, Washington, United States of America, University of Washington, School of Arts and Sciences, Department of Asian Language and Literature, Seattle, Washington, United States of America

  • Dong Yi ,

    yd_house@hotmail.com (Dong Yi); asiawu5@sina.com (YW); liuling_505@sina.com (LL)

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Ling Liu ,

    yd_house@hotmail.com (Dong Yi); asiawu5@sina.com (YW); liuling_505@sina.com (LL)

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

  • Yazhou Wu

    yd_house@hotmail.com (Dong Yi); asiawu5@sina.com (YW); liuling_505@sina.com (LL)

    Affiliation Department of Health Statistics, College of Preventive Medicine, Third Military Medical University, Chongqing, China

Abstract

Background

Several studies indicate that the statistical education model and level in medical training fails to meet the demands of clinicians, especially when they want to understand published clinical research. We investigated how study designs and statistical methods in clinical studies have changed in the last twenty years, and we identified the current trends in study designs and statistical methods in clinical studies.

Methods

We reviewed 838 eligible clinical study articles that were published in 1990, 2000, and 2010 in four journals New England Journal of Medicine, Lancet, Journal of the American Medical Association and Nature Medicine. The study types, study designs, sample designs, data quality controls, statistical methods and statistical software were examined.

Results

Substantial changes occurred in the past twenty years. The majority of the studies focused on drug trials (61.6%, n = 516). In 1990, 2000, and 2010, there was an incremental increase in RCT studies (74.4%, 82.8%, and 84.0%, respectively, p = 0.013). Over time, there was increased attention on the details of selecting a sample and controlling bias, and there was a higher frequency of utilizing complex statistical methods. In 2010, the most common statistical methods were confidence interval for superiority and non-inferiority comparison (41.6%), survival analysis (28.5%), correction analysis for covariates (18.8%) and Logistic regression (15.3%).

Conclusions

These findings indicate that statistical measures in clinical studies are continuously developing and that the credibility of clinical study results is increasing. These findings provide information for future changes in statistical training in medical education.

Introduction

Recently, the design and statistical analysis of clinical studies have become increasingly strict and elaborate as a result of Evidence-based medicine (EBM). Many institutions published instructions for study design and statistical analysis of clinical studies, e.g. the guidelines for format and content of the clinical and statistical sections by Food and Drug Administration (FDA) 1988 [14]. Several clinical research articles have indicated a trend of increasingly sophisticated statistical techniques, and hidden information in the data can be shown more thoroughly and precisely with these techniques [5]. These techniques include methods to compare patterns (superiority, non-inferiority and equality) and data sets and the use of multiple comparison and survival analysis.

However, these improvements also make articles difficult to understand and grasp. A recent cross-sectional study found that less than half of the 277 sampled internal medicine residents had adequate statistical knowledge and understanding to follow the medical literature [6]. Several studies indicate that the statistical education model and level in present medical training fails to meet the demands of clinicians, especially when they want to understand published clinical research [710].

Medical training should include training in complex statistics [11], but there is uncertainty about what should be added and enhanced in the medical curriculum. Educators should agree on the type and depth of statistical knowledge that should be imparted on future clinicians.

Therefore, the object of this study was mainly to assess how study designs and statistical methods have changed in the last twenty years and to determine the current trends in study design and statistical methods in clinical studies.

Methods

Inclusion criteria

There are two main types of clinical studies: clinical trials (also called interventional studies) and observational studies (PubMed homepage, ClinicalTrials.gov). So in this study, the inclusion criteria can be defined as following:

The type of study: the clinical trials and observational studies;

Participants (articles): The articles from New England Journal of Medicine (NEJM), Lancet, Journal of the American Medical Association (JAMA) and Nature Medicine;

Intervention (treatment factors or exposure factors): Observational study (descriptive study, case-control study and cohort study), drug trial, medical apparatus and instruments, operation methods, health education, diet therapy, exercise therapy, stem cell therapy, et al;

Control: Exposure factors (observational study), other interventions or placebo(clinical trials);

Outcome: the statistical methods of inclusion articles, such as descriptive statistics, t-test, ANOVA, Survival analysis, and statistical software et al.

Exclusion criteria

Comments, case reports, systematic reviews, meta-analyses, genome-wide analyses and articles did not involve primary or secondary data analysis were excluded from the study. The articles were also excluded from the study if the sample size was less than 10.

Selected Articles

To assess how statistical methodology of clinical studies has changed in the last twenty years. The articles were sampled in New England Journal of Medicine (NEJM), Lancet, Journal of the American Medical Association (JAMA) and Nature Medicine on three time points 1990, 2000, and 2010 (Fig 1).

thumbnail
Fig 1. Flow chart of the selection criteria for the content analysis.

https://doi.org/10.1371/journal.pone.0140159.g001

The sampling frame for articles included all issues of the selected four journals in the years 1990, 2000 and 2010 and were web-searched only on PubMed homepage. On PubMed homepage, regarding article types, the type of clinical studies was selected and then entered the site of ClinicalTrials.gov. ClinicalTrials.gov is a registry and results database of publicly and privately supported clinical studies (clinical trials and observational studies) of human participants conducted around the world. Then the "PubMed Advanced Search Builder" can be obtained, for example, the search builder of clinical trials of Lancet in 1990 is as: Search ("Lancet (London, England)"[Journal]) AND ("1990"[Date—Publication]: "1990"[Date—Publication]) Filters: Clinical Trial. In this way, all articles of clinical studies published in the selected four journals and on three years can be searched to conduct this content analysis.

All articles within those issues which load down from website of ClinicalTrials.gov were then evaluated for eligibility according to inclusion/exclusion criteria. Eligible articles were those in which authors implemented a study and analyzed primary or secondary data for the clinical trials and observational studies. Specifically, the articles of original clinical trial and clinical investigation were eligible for inclusion, regarding to RCTs, case-control studies, cohort studies and descriptive studies. As commentary, case reports, systematic reviews, meta-analyses, genome-wide analyses, and articles did not involve primary or secondary data analysis, they were excluded from the study.

Data collection

A data collection schedule was discussed within the research group. The main contents of the discussion include: which aspects can reflect the statistical methodological shift of clinical studies, what categorizations should be included in each aspect (as determined by Table of Contents)? In this study, the aspects of statistical trends should be: study types, study designs, sample designs, data quality control, statistical methods and statistical software.

There are some different ideas for categorizations of statistical methods. In this study, the pre-determined categorization for statistical methods was done similarly to what Arnold LD et al. did previously [11]. The statistical methods were not specified in some articles clearly. For example, if the authors calculated hazard ratios but did not specify the type of survival analysis, the articles were coded as "Survival analysis". If no specific correction analysis was mentioned but the word "adjusted" was used, the article was coded as using correction analysis for covariates.

Two readers with masters-level training in biostatistics independently abstracted data pertaining to study types, study designs, sample designs, data quality control, statistical methods and statistical software. After abstracting the data for each article, two readers entered data into independent files and then merged the entries into one file for data reconciliation by Epidata 3.0.

Except input error, instances of discordant information were flagged (less than 10% in 838). Two readers reconciled the data case-by-case referencing the article, when discrepancies were present e.g. the number of used statistical methods is different for one article between two readers. When discrepancies could not be resolved by referencing the article, the readers would only consult statisticians (corresponding authors) until they reached an agreement.

Data analysis

Descriptive statistics were generated for each data category (e.g. the number of statistical designs/methods), overall by year of publication. Significant differences for variables (e.g. prevalence of statistical designs/methods) over the three study years (1990, 2000, and 2010) were examined using chi-square and Fisher exact test, and p-values of less than 0.05 were considered to be statistically significant. The software SPSS v.18 and Epidata 3.0 were used for all analyses.

Results

Study types

After searching in PubMed, 1,099 clinical study articles were got in four journals. Excluding the 261 articles, which were adapting to the exclusion criteria, a total of 838 eligible articles were included, including 223 (26.6%) from 1990, 314 (37.5%) from 2000, and 301 (35.9%) from 2010. As shown in Table 1, the majority of the studies focused on drug trials (61.6%, n = 516). There were significant differences in three study types over the three years, including drug trial (p = 0.004), operation method (p = 0.028) and other types (p = 0.008). The so-called "other types" include health Education, diet therapy, exercise therapy, stem cell therapy, etc. There was no significant difference in study type of medical apparatus and instruments over the three years (3.1% in 1990, 6.1% in 2000, and 3.0% in 2010; p = 0.110).

thumbnail
Table 1. Study types of 838 articles published in 1990, 2000, and 2010.

https://doi.org/10.1371/journal.pone.0140159.t001

Study designs

As demonstrated in Table 2, the most common clinical study design was the randomized controlled trial (RCT) (81.1%, n = 679). The number of descriptive studies decreased (12.2% in 1990, 7.6% in 2000, and 5.3% in 2010; p = 0.017). The number of case-control studies decreased (11.2% in 1990, 5.1% in 2000, and 0.7% in 2010; p<0.001). The number of cohort studies increased (2.2% in 1990, 4.5% in 2000, and 10.0% in 2010; p<0.001). The number of published RCTs increased (74.4% in 1990, 82.8% in 2000, and 84.0% in 2010; p = 0.013).

thumbnail
Table 2. Study designs of 838 articles published in 1990, 2000, and 2010.

https://doi.org/10.1371/journal.pone.0140159.t002

The majority of the control design were parallel (82.0%, n = 687). There were no significant differences in three control designs over the three years, including factorial (p = 0.243), sequential (p = 0.628) and other controls (p = 0.374). The number of parallel controls increased (76.2% in 1990, 83.1% in 2000, and 85.0% in 2010; p = 0.028). The number of crossover controls decreased (7.6% in 1990, 4.1% in 2000, and 1.7% in 2010; p = 0.003).

The majority of the comparison designs focused on difference test (55.3%, n = 463). The number of difference articles increased (57.8% in 1990, 84.1% in 2000, and 93.7% in 2010; p<0.001). The number of superiority studies increased (14.8% in 1990, 27.1% in 2000, and 30.6% in 2010; p<0.001). And the number of non-inferiority studies increased (57.8% in 1990, 84.1% in 2000, and 93.7% in 2010; p<0.001). The number of studies using primary endpoints also increased (57.8% in 1990, 84.1% in 2000, and 93.7% in 2010; p<0.001).

Sample designs

As shown in Table 3, the number of studies that used multiple centers increased (26.9% in 1990, 63.7% in 2000, and 81.4% in 2010; p<0.001). More studies reported the use of two groups or three or more groups (89.2% in 1990, 92.4% in 2000, and 96.3% in 2010; p = 0.006). Over the three years, there was a significant increase in the reporting of sample estimation methods (21.5% in 1990, 48.4% in 2000, and 79.4% in 2010; p<0.001) and power estimation (17.0% in 1990, 45.5% in 2000, and 77.1% in 2010; p<0.001).

thumbnail
Table 3. Sample designs of 838 articles published in 1990, 2000, and 2010.

https://doi.org/10.1371/journal.pone.0140159.t003

Data quality controls

Four indexes of data quality are shown in Table 4. The first clinical trial register (CTR) was established in February 2002 by the American National Institutes of Health (NIH), National Library of Medicine (NLM) and FDA [12], so clinical study articles published in 1990 and 2000 did not register on the CTR. However, in 2010, the proportion of registered studies was 58.1%.

thumbnail
Table 4. Data quality control of 838 articles published in 1990, 2000, and 2010.

https://doi.org/10.1371/journal.pone.0140159.t004

Though there was no significant difference in the use of blindness over the three years (p = 0.117 for open, p = 0.170 for single-blind, p = 0.219 for double-blind), but there were significant differences in the form of data entry over the three years (all p<0.001). An increasing number of studies reported on Data Sets (DS) (13.9% in 1990, 68.5% in 2000, and 76.4% in 2010; P<0.001).

Statistical methods

As demonstrated in Table 5, the most commonly reported statistics in the reviewed articles were descriptive statistics (100.0%), ANOVA (47.2%) and T-test (36.3%). Between 1990 and 2010, there was no significant difference in the following statistics: including descriptive statistics, chi-square, fisher exact, Mantel-Haenszel, T-test and ANOVA (p>0.05).

thumbnail
Table 5. Statistical methods published in 1990, 2000, and 2010.

https://doi.org/10.1371/journal.pone.0140159.t005

From 1990 to 2010, there was a increase in the following statistics: specifically logistic regression (12.3% in 1990, 15.6% in 2000, and 17.3% in 2010; p = 0.021), multiple comparison (5.6% in 1990, 6.3% in 2000, and 10.2% in 2010; p = 0.047), Cox models (7.7% in 1990, 13.6% in 2000, and 24.6% in 2010; p = 0.031), Kaplan Meier tests (3.6% in 1990, 11.7% in 2000, and 23.9% in 2010; p = 0.031), sensitivity analysis (1.3% in 1990, 3.3% in 2000, and 5.3% in 2010; p = 0.046) and correction analysis for covariates (8.3% in 1990, 20.3% in 2000, and 25.1% in 2010; p<0.001).

From 1990 to 2010, there was a significant increase in the reporting of confidence interval, specifically superiority (14.8% in 1990, 27.1% in 2000, and 30.6% in 2010; p<0.001). From 1990 to 2010, there was a significant difference in the reporting of non-inferiority (8.1% in 1990, 20.7% in 2000, and 18.3% in 2010; p<0.001). But there was a significant decrease in the reporting of difference (45.3% in 1990, 26.8% in 2000, and 21.3% in 2010; p<0.001).

Interim analysis was reported infrequently overall, with significantly differences over time (2.5% in 1990, 6.2% in 2000, and 9.3% in 2010; p = 0.009).

Statistical software

As recorded in Table 6, there was a significant increase over time in reporting of SAS (13.5% in 1990, 41.7% in 2000, and 46.8% in 2010; p<0.001) and STATA (3.1% in 1990, 11.5% in 2000, and 10.6% in 2010; p = 0.002). There was no significant difference over time in reporting of SPSS (p = 0.104) and R software (p = 0.082).

thumbnail
Table 6. Statistical software published in 1990, 2000, and 2010.

https://doi.org/10.1371/journal.pone.0140159.t006

The number of studies that use database to manage data increased (21.1% in 1990, 42.0% in 2000, and 69.4% in 2010; p<0.001).

Discussion

The choice of these four general medicine journals for this study is strength, as they should be the leading medical journals with an extremely broad readership. They are widely read by clinicians in a variety of specialties and publish across a range of clinically related issues, so they are certainly representative of published paper in general. For the generalizability of findings, which journals could be included in have been discussed with PLOS ONE Academic Editors for many times.

Although a large number of 838 eligible articles were included in this study, the focus on these four general medicine journals for this content analysis is a limitation as it restricts generalizability of findings and does not account for variation by specialty. For example, preferred choice of study designs and data analysis expectations in surgical fields may differ from those in psychiatry or pediatrics. Thus, trends in study design and analytic techniques present here may differ from journals with more directed target audiences and area of focus. To assess differences in the use of statistical methods in general medicine journals and specialized journals, we identified reviews of statistical methods used in specialized journals. A 1995 study comparing prevalence and use of statistical analysis found that rheumatology journals [13] tended to use fewer and simpler statistics than general medicine journals. So this study still have important guidance for statistics education. Meanwhile, if this content analysis was extended to include articles from other integrative journals, then it is anticipated that individual findings would vary but that overall trends of increasing statistical complexity over the decades would be similar.

In this content analysis and individual findings, what are overall trends of increasing statistical complexity over the decades? Regarding trends of study types, drug trials decreased over time, but other types (e.g. some new skills of health education, diet therapy, exercise therapy, stem cell therapy, etc.) occurred with more frequency. Regarding trends of study design, descriptive and case-control studies occurred with less frequency over time; Cohort and RCT studies occurred with more frequency. this phenomenon suggests that study design has become increasingly rigorous in the last twenty years. Meanwhile, the tendency to use statistical hypothesis testing may be associated with a decrease in studies that compared difference and an increase in studies that utilized superiority and non-inferiority, especially in 2010; this phenomenon suggests that statistical hypothesis testing has become more accurate than before.

Regarding trends of sample design and data quality control, they are two key aspects of clinical study results, as appropriate sample design and rigorous data quality control can improve the reliability and credibility of clinical study results [1417]. The results of the present study also show that the use of multiple centers, sample estimation methods, power estimation and data set comparisons has increased over time. Some of these methods, such as sample estimation methods, power estimation and FAS-PPS-SS, were used in earlier studies but were less likely to be included in studies published before clinical trial guidelines were published by the International Conference on Harmonization (ICH-E9) in 1998 [4]. These trends show that the EBM and journals have been increasingly strict on the quality of trials.

Regarding trends of statistical methods, the proportion of papers that reported using multiple comparison, survival analysis (Cox models and Kaplan-Meier), sensitivity analysis, interim analysis, confidence interval (superiority and non-inferiority) and correction analysis increased significantly from 1990 to 2010. These complex statistical methods require strong statistical understanding to interpret their application and the results. Some of these technologies were less likely to be included before the statistical analysis guidelines for clinical trials were published in 1992 and 1993 [2,3]. Because the rules of statistical analysis guidelines have clearly specified many complex statistical methods must be concluded in, e.g. confidence interval (superiority and non-inferiority). The phenomenon that complex statistical methods are used more frequently indicates that journals are more strict regarding the accuracy and type of statistical analyses that are reported in articles [7,11,18].

Only 0.7% of the surveyed articles didn’t mention the type of statistical software that was used for the analyses in 2010, and nearly 90.0% of those articles used SAS, SPSS, STATA and R. The data on statistical software show that professional statistical software is used with increasing frequency, and journal editors demand more precise details of statistical methods.

From 1990 to 2010, we note that there has been little change in content of medical education [9]. Even in instances where statistical content of training may have been revised and updated, the degree to which material is covered may be limited, e.g. confidence interval (Superiority and non-inferiority), sensitivity analysis, interim analysis, and correction analysis even not covered in most textbook. This contrasts with the substantial increases in frequency and complexity of statistical reporting.

While our findings do not directly suggest that medical education necessarily needs to be modified, the statistical reporting trends described may have implications for medical education. Similarly, while this study does not provide data to suggest that improved statistical knowledge could translate to more effective use of the literature, we do propose that physicians’ familiarity with certain (complex) statistical approaches may assist them in critically evaluating and weighing the literature.

To this end, medical educators may wish to be aware of the benefits and limitations of different and more complex statistical strategies as they try to teach certain topical content or critical evaluation skills. Moreover, as future and current clinicians engage in a life-long learning process, findings from this study may be used as part of the discussion about statistical training across the continuum of medical education.

Supporting Information

Author Contributions

Conceived and designed the experiments: Dong Yi LL Dali Yi. Performed the experiments: DM GL LZ. Analyzed the data: LL DM HC. Contributed reagents/materials/analysis tools: LZ QX YZ XL YW HC. Wrote the paper: Dong Yi LL Dali Yi JCP.

References

  1. 1. FDA. Guideline for format and content of the clinical and statistical sections of an application 1988.
  2. 2. MHLW. Guideline for the statistical analysis of clinical trials 1992.
  3. 3. EMEA. Biostatistical methodology in clinical trials 1993.
  4. 4. ICH-E9. statistical principles for clinical trials 1998.
  5. 5. Horton J, Switzer S. Statistical methods in the journal. N Engl J Med. 2005;353:1977–1979. pmid:16267336
  6. 6. Windish M, Huot J, Green L. Medicine residents’ understanding of the biostatistics and results in the medical literature. JAMA. 2007;298:1010–1022. pmid:17785646
  7. 7. Welch E, Gabbe S. Statistics usage in the American Journal of Obstetrics and Gynecology: Has anything changed? Am J Obstet Gynecol. 2002;186:584–586. pmid:11904628
  8. 8. Rao G, Kanter L. Physician numeracy as the basis for an evidence-based medicine curriculum. Acad Med. 2010;85: 1794–1799. pmid:20671540
  9. 9. Lambert R, Lurie J, Lyness M. Standardizing and personalizing science in medical education. Acad Med. 2010; 85: 356–362. pmid:20107368
  10. 10. Sackett L, Rosenberg M, Gray A. Evidence based medicine: what it is and what it isn't. BMJ. 1996; 312:71–72. pmid:8555924
  11. 11. Arnold D, Braganza M, Salih R. Statistical Trends in the Journal of the American Medical Association and Implications for Training across the Continuum of Medical Education. PLoS One. 2013 Oct 30;8(10):e77301. pmid:24204794
  12. 12. ClinicalTrials.gov. A service of the U.S. National Institutes of Health. Available: http://www.clinicaltrials.gov/ct2/about-site/background.
  13. 13. Arya R, Antonisamy B, Kumar S. Sample Size Estimation in Prevalence Studies. Indian J Pediatr. 2012; 79:1482–1488. pmid:22562262
  14. 14. Olsen O, Middleton P, Ezzo J. Quality of Cochrane reviews: assessment of sample from 1998. BMJ. 2001;323:829–832. pmid:11597965
  15. 15. Higgins P, Altman G, Gotzsche C. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011 Oct 18;343:d5928. pmid:22008217
  16. 16. Armijo-Olivo S, Stiles R, Hagen A. Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. J Eval Clin Pract. 2012; 18:12–18. pmid:20698919
  17. 17. Turner L, Boutron I, Hrobjartsson A. The evolution of assessing bias in Cochrane systematic reviews of interventions: celebrating methodological contributions of the Cochrane Collaboration. Syst Rev. 2013 Sep 23;2:79. pmid:24059942
  18. 18. Hellems A, Gurka J, Hayden F (2007) Statistical Literacy for Readers of Pediatrics: A Moving Target. Pediatrics. 2007; 119: 1083–1088. pmid:17545374