Biomedical literature is increasingly enriched with literature reviews and meta-analyses. We sought to assess the understanding of statistical terms routinely used in such studies, among researchers.
An online survey posing 4 clinically-oriented multiple-choice questions was conducted in an international sample of randomly selected corresponding authors of articles indexed by PubMed.
A total of 315 unique complete forms were analyzed (participation rate 39.4%), mostly from Europe (48%), North America (31%), and Asia/Pacific (17%). Only 10.5% of the participants answered correctly all 4 “interpretation” questions while 9.2% answered all questions incorrectly. Regarding each question, 51.1%, 71.4%, and 40.6% of the participants correctly interpreted statistical significance of a given odds ratio, risk ratio, and weighted mean difference with 95% confidence intervals respectively, while 43.5% correctly replied that no statistical model can adjust for clinical heterogeneity. Clinicians had more correct answers than non-clinicians (mean score ± standard deviation: 2.27±1.06 versus 1.83±1.14, p<0.001); among clinicians, there was a trend towards a higher score in medical specialists (2.37±1.07 versus 2.04±1.04, p = 0.06) and a lower score in clinical laboratory specialists (1.7±0.95 versus 2.3±1.06, p = 0.08). No association was observed between the respondents' region or questionnaire completion time and participants' score.
A considerable proportion of researchers, randomly selected from a diverse international sample of biomedical scientists, misinterpreted statistical terms commonly reported in meta-analyses. Authors could be prompted to explicitly interpret their findings to prevent misunderstandings and readers are encouraged to keep up with basic biostatistics.
Citation: Mavros MN, Alexiou VG, Vardakas KZ, Falagas ME (2013) Understanding of Statistical Terms Routinely Used in Meta-Analyses: An International Survey among Researchers. PLoS ONE 8(1): e47229. doi:10.1371/journal.pone.0047229
Editor: German Malaga, Universidad Peruana Cayetano Heredia, Peru
Received: May 23, 2012; Accepted: September 11, 2012; Published: January 11, 2013
Copyright: © 2013 Mavros et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors have no support or funding to report.
Competing interests: The authors have declared that no competing interests exist.
Literature reviews, including systematic reviews and meta-analyses, are critical components of evidence-based medicine. Such studies are commonly regarded as valuable sources of evidence and influence both clinical practice and public health policy , . Following the expansion of published biomedical original research, the publication of literature reviews has also greatly increased . Systematic reviews and meta-analyses are expected to accumulate and synthesize the total body of evidence regarding a topic and present it in a way that is comprehensible to busy health practitioners.
Statistical terms commonly used in meta-analyses, but also original research studies, include effect estimate measures such as the odds ratio (OR), risk ratio (RR), and weighted mean difference (WMD). Another important component of evidence synthesis studies is heterogeneity, which can be classified as clinical or statistical heterogeneity. Previous studies have implied a suboptimal understanding of such statistical terms among readers and/or researchers, but no study to our knowledge has assessed the understanding of plain effect estimates, provided in a commonly-encountered, clinical context. In this regard, we sought to investigate the current level of comprehension of statistical terms commonly used in meta-analyses.
Survey design and participants
An on-line survey was conducted from December 2011 to January 2012, based on the methodology of electronic surveys previously published –. Briefly, we selected a random sample of PubMed unique identifiers (PMID) between 10,000,000 and 22,000,000 (mostly referring to articles published during the last 15 years), using a random number generator . We established communication with the corresponding authors who had an e-mail address available at the indexed affiliation and asked them to voluntarily complete an open, web-based questionnaire ; this study was not announced or advertised, and access to the questionnaire by non-invited individuals was unlikely. By using this approach, we tried to survey a random, representative, and diverse international sample of researchers. In case of duplicate responses (posted from the same IP address within 24 hours), only the first response was analyzed.
Participants were informed about the aims of the study, the length of time of the survey, and the primary investigator (MEF). The questionnaire was a structured, web-based, multiple-choice form, comprising of 5 single-answer questions. Four mandatory questions evaluated the understanding of simple statistical terms commonly used in meta-analyses (OR, RR, WMD, and heterogeneity), in a clinical context, and the last, optional question, inquired the specialty of the respondent (Table 1). We also recorded the questionnaire completion time and the participants' country of origin as derived by their Internet Protocol (IP) address; no other personal information was collected. Answers were submitted electronically to ensure anonymity of the participants. The survey and study protocol were approved by the Ethics Committee of the Alfa Institute of Biomedical Sciences (AIBS), Athens, Greece. Informed consent of the participants was implied by the completion and electronic submission of the questionnaire. The study has been described in concordance with the CHERRIES (Checklist for Reporting Results of Internet E-Surveys) guidelines .
Table 1. Our questionnaire.doi:10.1371/journal.pone.0047229.t001
Data analysis and statistical methods
Respondents' answers were pooled and graphically presented. A score was calculated for each participant, representing the number of correct answers (1 point was awarded for each correct answer). Univariate comparisons were performed to examine the potential effect of respondents' specialty, region, and questionnaire completion time on their score. We used Pearson correlation, Student's t-test, and analysis of variance tests for normally distributed variables, and Spearman correlation, Mann-Whitney, and Kruskal-Wallis (for non-parametrically distributed variables) tests, as appropriate. The normality of the distribution of the variables was assessed with the Wilk-Shapiro test. All analyses were performed with STATA 11.2 (Stata Corp., College Station, TX, USA) statistical software package. A p<0.05 was considered to denote statistical significance.
The online questionnaire was accessed 800 times and after exclusion of 1 duplicate report, a total of 315 complete forms were analyzed (participation rate 39.4%). The median questionnaire completion time was 202 seconds (interquartile range: 143 to 362 seconds). Most participants completed the questionnaire from Europe (151/315, 48%) and North America (99, 31%), and fewer from Asia/Pacific (52, 17%) and Central & South America or Africa (13, 4%). Most of the participating physicians (n = 169; 16/315 respondents did not provide relevant data) had a medical specialty (69%, 116/169; including psychiatry), while 25% (43/169) had a surgical specialty (including anesthesiology) and few (6%, 10/169) had a clinical laboratory specialty (including radiology). 130 respondents were non-clinicians (non-physicians or physicians without specialty).
Responses to our questions are presented in Figure 1. Overall, almost half of the ‘meta-analysis interpretation’ questions had been answered correctly (51.7%, 651/1260). Thirty-three (10.5%) respondents answered correctly all 4 questions, while 29 (9.2%) answered incorrectly all 4 questions. Almost one third of the respondents (111, 35.2%) answered at least 3 of 4 questions correctly. Regarding each question (Figure 1), 51.1% (161/315), 71.4% (225/315), and 40.6% (128/315) of the participants correctly interpreted statistical significance (or lack of statistical significance) for a given OR, RR, and WMD estimate (with 95% confidence intervals), respectively. Less than half (43.5%, 137/315) of the participants correctly responded that no statistical model can adjust for clinical heterogeneity in meta-analyses.
Figure 1. The responses of the participating researchers to each question.
Correct answers are marked with an asterisk; the questionnaire is presented in Table 1. [Q = Question; OR = Odds Ratio; RR = Risk Ratio; WMD = Weighted Mean Difference].doi:10.1371/journal.pone.0047229.g001
The percentages of correct responses to each question among the respondents' groups are presented in Figure 2. Clinicians had a higher score than non-clinicians (mean score ± standard deviation: 2.27±1.06 versus 1.83±1.14, p<0.001). Among clinicians, there was a trend towards a higher score in medical specialists versus the others (2.37±1.07 versus 2.04±1.04, p = 0.06) and towards a lower score in clinical laboratory specialists versus the others (1.7±0.95 versus 2.3±1.06, p = 0.08). No statistically significant difference was observed between surgeons versus other specialists (2.12±1.1 versus 2.32±1.05, p = 0.28). There was no difference in the score with regard to the respondents' region (Europe 2.12±1.09, North America 2.08±1.1, Asia/Pacific 1.98±1.24, and Central & South America and Africa 1.85±0.99; p = 0.62). There was no correlation between questionnaire completion time and participants' score (p = 0.25).
Figure 2. Percentage of correct responses to each question, stratified by specialty.
Clinicians had more correct answers than non-clinicians (mean score ± standard deviation: 2.27±1.06 versus 1.83±1.14, p<0.001). [Q = Question; OR = Odds Ratio; RR = Risk Ratio; WMD = Weighted Mean Difference].doi:10.1371/journal.pone.0047229.g002
The main finding of our survey is that, even among researchers, there is incomplete understanding of statistical terms commonly reported in meta-analyses. This finding was more pronounced in non-clinicians; among clinicians, those with a medical specialty tended to have a slightly better understanding of statistical terms than the others. Although the questions were clinically oriented and commonly encountered in the biomedical literature, overall, almost half (48.3%) were answered incorrectly; 10.5% of the respondents answered correctly all questions, while 9.2% answered all questions incorrectly.
Few studies have addressed the level of comprehension of commonly used statistical terms among the providers and the recipients of biomedical research (authors and readers). Previous studies noted an incomplete understanding of the difference between odds ratio and risk ratio, in terms of both calculation  and interpretation , , even among researchers . Others reported that the use of relative (i.e. OR, RR) instead of absolute (i.e. number needed to treat) estimate measures led to an overestimation of the effect by the readers , . Although limited published data have suggested an incomplete understanding of basic biostatistics, i.e. the difference between odds ratio and risk ratio, this is the first study to the best of our knowledge to assess the interpretation of plainly given effect estimates. Surprisingly, almost half of the given estimates (OR, RR, WMD) were misinterpreted by corresponding authors of articles indexed in PubMed.
Our findings suggest a better understanding of the tested statistical terms among clinicians, compared with non-clinicians. Clinicians with a medical specialty tended to score higher than the rest. Interestingly, the groups that tended to score higher were the ones that were mostly represented in our analysis (169 clinicians versus 130 non-clinicians, 116 medical specialists versus 53 surgical/clinical laboratory specialists). This may indicate a higher degree of understanding among clinicians who publish more (as derived from our analysis). Of note, in the United States, medical graduates entering a surgical specialty have higher medical licensing examination scores than their medical and clinical laboratory counterparts ; no such trend was observed in our sample.
Our study has significant implications. It has already been argued that a large part of published biomedical research is inaccurate , . Adding the fact that commonly used statistical terms are misinterpreted by the readers, the conclusion could be particularly troublesome. Hopefully, most of the misunderstandings are resolved through the own article's interpretation of results. In this regard, it is of paramount importance that the readers have the ability to self-interpret published research findings, especially since some medical journals currently ask the authors to present “appropriate indicators of measurement error or uncertainty (such as confidence intervals) [and] avoid relying solely on statistical hypothesis testing, such as the use of P values” .
Although through this study we cannot identify the source of the problem, nor suggest a practical solution, the first step in the problem solving process remains the definition and identification of the problem. Our study also serves as a call for careful consideration of published research by journal editors, article authors, and readers. At the end of the day, in this era of rapidly evolving evidence-based medicine, physicians would rather be able to properly interpret current research findings than memorize a large amount of potentially outdated information.
One might argue that our findings should not be generalized to the majority of physicians or biomedical scientists. However, the participants in our survey were corresponding authors of articles indexed by PubMed, who in general are expected to be more statistically knowledgeable than ordinary readers; in addition, the participants represented a random, international sample of scientists and physicians of various specialties. Another potential explanation for our findings would be that the participants did not pay adequate attention to the questions; this is unlikely, considering that those not interested in our survey would not complete and submit it (only complete responses were assessed), and that the median completion time was around 3 minutes (for 4 “interpretation” questions); in this regard, it should be acknowledged that the participation rate was relatively low (39.4%), which is not unusual for this type of research. Last, our study suffers the inherent limitations of online surveys, including self-selection bias and concerns on the accuracy and reproducibility of the responses , . In this regard, specific details as to how many publications were screened, how many emails were sent, and how many email addresses were invalid were not available; therefore, we could not exclude the possibility that some regions were under-represented due to self-selection bias. However, the representation of each region in our survey was similar with the global relative biomedical research productivity –.
In conclusion, a large proportion of biomedical researchers misinterpreted simple effect estimates commonly used in meta-analyses. Journal editors and article authors may embrace a more comprehensive interpretation of each study's findings, while readers are encouraged to keep up with basic biostatistics.
Conceived and designed the experiments: MM MEF. Performed the experiments: MM VGA MEF. Analyzed the data: MM VGA KZV MEF. Contributed reagents/materials/analysis tools: MM VGA KZV MEF. Wrote the paper: MM VGA KZV MEF.
- 1. Bero LA, Jadad AR (1997) How consumers and policymakers can use systematic reviews for decision making. Ann Intern Med 127 37–42. doi: 10.7326/0003-4819-127-1-199707010-00007
- 2. Mulrow CD, Cook DJ, Davidoff F (1997) Systematic reviews: critical links in the great chain of evidence. Ann Intern Med 126 389–91. doi: 10.7326/0003-4819-126-5-199703010-00008
- 3. Shojania KG, Bero LA (2001) Taking advantage of the explosion of systematic reviews: an efficient MEDLINE search strategy. Eff Clin Pract 4 157–62.
- 4. Mavros MN, Alexiou VG, Vardakas KZ, Tsokali K, Sardi TA, et al. (2012) Underestimation of Clostridium difficile infection among clinicians: an international survey. Eur J Clin Microbiol Infect Dis 31: 2439–44. doi: 10.1007/s10096-012-1587-9
- 5. Alexiou VG, Ierodiakonou V, Peppas G, Falagas ME (2010) Antimicrobial prophylaxis in surgery: an international survey. Surg Infect (Larchmt) 11: 343–8. doi: 10.1089/sur.2009.023
- 6. Falagas ME, Makris GC, Karageorgopoulos DE, Batsiou M, Alexiou VG (2009) How well do clinical researchers understand risk estimates? Epidemiology 20 930–1. doi: 10.1097/ede.0b013e3181ba40eb
- 7. Falagas ME, Ierodiakonou V, Alexiou VG (2009) Clinical practice of obtaining blood cultures from patients with a central venous catheter in place: an international survey. Clin Microbiol Infect 15 683–6. doi: 10.1111/j.1469-0691.2009.02784.x
- 8. http://www.random.org. Accessed 2 Spetember 2012.
- 9. http://www3.formassembly.com. Accessed 2 September 2012.
- 10. Eysenbach G (2004) Improving the quality of Web surveys: the Checklist for Reporting Results of Internet E-Surveys (CHERRIES). J Med Internet Res 6 e34. doi: 10.2196/jmir.6.3.e34
- 11. Katz KA (2006) The (relative) risks of using odds ratios. Arch Dermatol 142 761–4. doi: 10.1001/archderm.142.6.761
- 12. Holcomb WL Jr, Chaiworapongsa T, Luke DA, Burgdorf KD (2001) An odd measure of risk: use and misuse of the odds ratio. Obstet Gynecol 98 685–8. doi: 10.1016/s0029-7844(01)01488-0
- 13. Naylor CD, Chen E, Strauss B (1992) Measured enthusiasm: does the method of reporting trial results alter perceptions of therapeutic effectiveness? Ann Intern Med 117 916–21. doi: 10.7326/0003-4819-117-11-916
- 14. Forrow L, Taylor WC, Arnold RM (1992) Absolutely relative: how research results are summarized can affect treatment decisions. Am J Med 92 121–4. doi: 10.1016/0002-9343(92)90100-p
- 15. National Resident Matching Program, Results and Data: 2011 Main Residency Match. (2011) National Resident Matching Program, Washington, DC.
- 16. Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124. doi: 10.1371/journal.pmed.0020124
- 17. Ioannidis JP (2005) Contradicted and initially stronger effects in highly cited clinical research. JAMA 294 218–28. doi: 10.1001/jama.294.2.218
- 18. Author Instructions, Archives of Internal Medicine Available: http://archinte.ama-assn.org/misc/ifora.dtl#Statistics. Accessed 2 September 2012.
- 19. Schmidt WC (1997) World-Wide Web survey research: Benefits, potential problems, and solutions. Behavior Res Methods Instruments Computers 29 274–9. doi: 10.3758/bf03204826
- 20. Wright KB (2005) Researching Internet-based populations: Advantages and disadvantages of online survey research, online questionnaire authoring software packages, and Web survey services. Journal of Computer-Mediated Communication 10: article 11. doi: 10.1111/j.1083-6101.2005.tb00259.x
- 21. Falagas ME, Michalopoulos AS, Bliziotis IA, Soteriades ES (2006) A bibliometric analysis by geographic area of published research in several biomedical fields, 1995–2003. CMAJ 175 1389–90. doi: 10.1503/cmaj.060361
- 22. Soteriades ES, Falagas ME (2005) Comparison of amount of biomedical research originating from the European Union and the United States. BMJ 331 192–4. doi: 10.1136/bmj.331.7510.192
- 23. Rahman M, Fukui T (2003) Biomedical publication–global profile and trend. Public Health 117 274–80. doi: 10.1016/s0033-3506(03)00068-4
- 24. Benzer A, Pomaroli A, Hauffe H, Schmutzhard E (1993) Geographical analysis of medical publications in 1990. Lancet 341 247. doi: 10.1016/0140-6736(93)90116-x