Reader Comments

Post a new comment on this article

Lack of statistical power undermines confidence in these findings

Posted by deevybee on 15 Jun 2013 at 17:12 GMT

PLOS One has built a reputation on the idea that it publishes work that is well-motivated and methodologically strong, regardless of the results. Unfortunately, we are increasingly seeing papers published here that do not meet this criterion. A good way of testing whether the PLOS One criteria are met is to read just the introduction and methods of a paper, without looking at results. If that approach had been applied here, it is unlikely the paper would have been published, because it is seriously underpowered. Unfortunately, when an underpowered study yields significant, and apparently exciting findings, methodological concerns are often forgotten, but this is dangerous, because it leads to proliferation of non-replicable false positive findings (Button et al, 2013). Please note, this is not to suggest any malpractice by the authors - but they do not seem to appreciate the need to do adequately powered studies if the field is to progress.
For instance, although the authors are interested in how the grammar-learning measure relates to tests of procedural and declarative learning, their sample size is inadequate to test this, and the reported correlations, although numerically different, have overlapping confidence intervals, so we cannot draw any conclusions, i.e.
Procedural: concatenative = .68 (95%CI .36 to .86)
Declarative: concatenative = .13 (95%CI -.31 to .52)
Procedural: analogical = .34 (95%CI -.10 to .67)
Declarative: analogical = .51 (95%CI .11 to .77)
The main reason for lack of confidence in the results, however, is that the effects are so enormous. Extracting the means and SEs from Figure 4, I computed that the effect size associated with genotype for the TOL score was 1.74 and that for Concatenative Grammar learning was 1.04. Although you can get a large effect on a cognitive phenotype from a rare mutation, it is unprecedented to get effects of this magnitude associated with a common polymorphism. It seems more likely these findings are just false positives.
We are not told who the participants were, but it seems likely they would have been recruited from the University, given that their scores on the learning task were in a similar range to those observed by Ettlinger et al (2012), who were described as students. This makes the result even more implausible, as it would mean a substantial effect would be found for a learning phenotype, even though the range of the phenotype was restricted to well-educated individuals.
If DRD2 was exerting such a massive effect on procedural aspects of grammar learning, we might have expected it to pop up in genome-wide association studies of specific language impairment, but, as far as I am aware, it has not.
One other minor point: The Tower of London scores are reported as "Normalized TOL score" in figure 4, but these cannot be z-scores, as described in the text, because the mean value is closer to 0.3 than to zero.
We are at a point in scientific progress where it has become possible to link together findings from genetics, psychology and neuroscience, and are seeing increasing numbers of studies that attempt to make such associations. I know many geneticists are concerned that non-geneticists are forging ahead with no appreciation of the order of magnitude of sample size that is needed to get replicable findings, and I think the field would benefit from more discussion of these issues, with a view to developing methodological standards.
Dr Ettlinger and I have had some exchanges about these issues on Twitter, but 140 characters rather limits the debate! He kindly stated he would welcome discussion in the comments here, and I await with interest his thoughts on these issues.

References

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, advance online publication. doi: 10.1038/nrn3475
Ettlinger, M., Bradlow, A. R., & Wong, P. C. M. (2012). Variability in the learning of complex morphophonology. Applied Psycholinguistics, http://dx.doi.org/10.1017....


No competing interests declared.

This paper says it better than I can!

deevybee replied to deevybee on 17 Jun 2013 at 11:26 GMT

Munafò, M. R., & Gage, S. H. (2013). Improving the reliability and reporting of genetic association studies. Drug and Alcohol Dependence(0). doi: http://dx.doi.org/10.1016...

No competing interests declared.