Research Article

Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results

  • Jelte M. Wicherts mail,

    Affiliation: Psychology Department, Faculty of Social and Behavioral Sciences, University of Amsterdam, Amsterdam, The Netherlands

  • Marjan Bakker,

    Affiliation: Psychology Department, Faculty of Social and Behavioral Sciences, University of Amsterdam, Amsterdam, The Netherlands

  • Dylan Molenaar

    Affiliation: Psychology Department, Faculty of Social and Behavioral Sciences, University of Amsterdam, Amsterdam, The Netherlands

  • Published: November 02, 2011
  • DOI: 10.1371/journal.pone.0026828

Response to Dr. Tractenberg's commentary

Posted by wicherts on 03 Nov 2011 at 21:17 GMT

I thank Dr. Tractenberg for starting the discussion on the implications of our findings in such a thoughtful way. I side with Dr. Tractenberg’s assessment that mandatory archiving of data is not a one-size-fits-all solution to the many problems associated with the conduct of statistical analyses of data in psychology and related fields. Besides the errors we document, researchers are given too many “degrees of freedom”(Simmons et al., 2011, to appear in Psychological Science) in how to select variables, transform the data, determine the specifics of the analysis, and exclude data points. Many researchers appear to exploit this freedom to their benefit (John et al., 2011, to appear in Psychological Science). Researchers who go to great lengths to arrive at a significant result will not likely put all their cards on the table when they will one day be forced to archive their data upon publication. For instance, they may choose to delete the “excluded cases” from the uploaded file, thereby impeding a check of whether the inclusion of these cases would have a bearing on the substantive conclusions. Other rules concerning unpublished work (Schooler, 2011, in Nature), sample size planning, analytic specifics, and the selective reporting of variables (Simmons et al., 2011) are clearly needed.
Dr. Tractenberg and I agree that no researcher should be allowed to be secretive about the data from his or her published research. Because it is impolite to “go ethical” on researchers who fail to share their data upon request, a procedure to deal with the reluctance to share data is clearly needed. Letting journals or funding organizations document openly whether researchers abide by the rules of data sharing is an interesting option. However, I remain convinced that psychology and related fields should consider obligatory archiving of data.

In the cost-benefit analysis of implementing mandatory archiving, the costs include, but are not limited to: (1) research time spent on documenting the data properly, (2) technical aspects of the archives, (3) giving researchers the first opportunity to follow-up on their own work with the data they collected, and (4) assuring confidentiality of research participants. In my view, (1) is just good science, (2) is already mostly solved by the existing archives (or it is at least doable), (3) can be dealt with by an embargo of release of the data (or exemptions if needed), and (4) is only relevant for a small portion of psychological studies and can be dealt with by exemptions and/or codes of conduct that are already in place for primary researchers.

The benefits of mandatory archiving include, among other things: (1) living by the scientific principle of openness, (2) working towards a novel view of scientific publishing, (3) improving quality of (reporting of) statistical results, and (4) prevention and detection of misconduct.

(1) Data need to be available as a matter of principle. Openness and the ability to check other people’s results (and the analyses that are prone to human error) are core tenets of science. This means neither that data from all studies are interesting (as Dr. Tractenberg rightly notes), nor that all analyses will be checked by nitpicking statisticians. However, it does mean that there should always be a possibility to do so. Just like we require always that research methods be explicated in detail in order to allow independent replication, it would simply be good practice to enable independent replication of the statistical analyses of the data.

(2) The practice of only reporting statistical results of research in highly condensed form (e.g., by two means and a t-test) strikes me as archaic (just like the anachronism of reporting significance levels rather than exact p-values; who on earth still uses those tables in the appendices of old statistical textbooks?). The time that considerations of journal space limited the information put into papers is long gone. I have this alternative view of scientific publishing in which the data are published alongside the researchers’ specific choice of statistical analyses and their chosen summary of the results. The data are the real treasure of research and they tend to get lost at an unacceptable rate if we let researchers keep them solely on their current computer in formats they not even understand themselves after a year or two. Advantages of archiving the data also include that data could be included in meta-analyses. In a recent meta-analysis, I could not include 10% of studies because of how researchers reported their findings. Finally, some data sets may be used for future research, which heightens the impact of the work (and citation scores).

(3) Another advantage of archiving data is that it forces researchers to be more careful and honest about their analyses. Knowing that your statistical results may one day be checked will hopefully lead to a higher quality of (reporting of) statistical results. It may also change the current culture in which it is acceptable that a sole researcher holds all the data and conducts all of the analyses in a black box, i.e., without even sharing the data with co-authors. This is a suboptimal practice given the likelihood that human factors in statistics may bias reported statistical outcomes.

(4) Openness with respect to data may act as a deterrent against misconduct. Falsification of data can hardly be proved without access to the raw data. Fabrication of data that appear genuine is not as easy as it seems (Al-Marzouki et al., 2005, in BMJ). In fact, in the recent major case of misconduct by social psychologist Diederik Stapel, one of the clearest signs that led to exposure was that he had supposedly used “copy-paste” to make up data for different studies. It is not farfetched to think that the fraud would have been exposed years ago had the data been there to nitpick.

To conclude, Dr. Tractenberg and I side on many issues and we both acknowledge the need to put an end to secrecy with respect to research data. Obligatory archiving of data is one way to do so, but other solutions are certainly possible. It is now up to journals, professional organizations, and funding organizations to weigh all costs and benefits and find a workable solution to improve science.

