Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeLimited test power and clash of interest?
Posted by krachmaninoff on 19 Dec 2011 at 10:58 GMT
I think this study contains some points that warrant critical discussion:
1. The paper does not give any information about the quantity of microtiming deviations in either of the two versions. At least the audio files show evidence for a much higher degree of timing deviations in the WN version compared with the 1/f version. The WN versions sound extremely "warped". In accordance with the rules of experimental design, the amount of deviations should be kept constant in both versions, and differences should only apply to the distribution of deviations and not to the average quantity. As far as we can see, authors don't do this and this would mean a serious confounding of variables.
2. The method used for evaluation is a forced choice paradigm. This is problematic because subjects only had a choice between "the devil and the deep blue sea". Even the preferred 1/f version is of poor aesthetic quality, and a rating paradigm (instead of forced choice) would have given more insight into the relative evaluations.
3. It remains questionable as to why no control condition (quantized version) was presented.
4. The first author's homepage (http://www.nld.ds.mpg.de/...) contains information that the author holds 2 patents on the same algorithm (1/f) identified in this study as the preferred procedure for humanizing music:
*H. Hennig, R. Fleischmann, F. Theis, T. Geisel. Method and Device for Humanizing Musical Sequences. U.S. Patent no. 7,777,123 (2010).
http://www.nld.ds.mpg.de/...
*H. Hennig, R. Fleischmann, F. Theis, T. Geisel. Method and Device for Humanizing Music Sequences. E.U. Patent 2043089A1, filed Sept. 28 (2007), published April 1 (2009).
Is this conflict of interest a good prerequisite for an independent and objective evaluation of two competing methods? No information on this clash of interest is given in the paper.
5. Evaluations of the two versions only show small differences. In terms of effects sizes we should not look for significance but for relevance of findings (see Ellis, 2010). A recalculation of the frequencies for "preference judgements" displayed in Figure 3 (1/f = 64.1%, N = 39) resulted in an effect size of w = 0.27 According to the benchmarks given by Ellis (2010), this is a small (w = 0.1) to medium effect (w = 0.3).
This if fine, however, a prospective analysis of test power (this is a measurement of the sensitivity of the experimental setup) by means of the software G*Power reveals that the observed effect size is based on a limited power of only 1-β = 0.39 (df = 1; α = 0.05). This is below chance level. To obtain a sufficient test power (according with the benchmark by Cohen) of 1-β = 0.80, the number of subjects had to be N = 110 (instead of 39). This was not the case and the study seems to be seriously underpowered (for calculations of effect size and test power based on a Chi2-analysis (see Sedlmeier & Renkewitz, 2008, pp. 560-562).
References
*Ellis, P. D. (2010). The essential guide to effect sizes. Statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press.
*Sedlmeier, P. & Renkewitz, F. (2008). Forschungsmethoden und Statistik in der Psychologie [Research methods and statistics in psychology]. München: Pearson Studium.
RE: Limited test power and clash of interest?
holgerh replied to krachmaninoff on 17 Apr 2012 at 23:52 GMT
We would like to reply to the comments of Krachmaninoff.
Reply to comment #1. The `amount of deviations' was indeed kept constant in both versions as measured by the standard deviation. As mentioned in section `Humanizing music' (2nd paragraph), the standard deviation for both the white noise humanized version and the 1/f humanized version was 15 ms. To compare the two methods, differences must not `apply to the distribution of deviations', but only to the autocorrelation function of the time series of deviations.
Reply to comment #3. A study on the perception of the exact version compared to the 1/f humanized version was carried out in great detail in a thesis of coauthor Anneke Fredebohm. Readers interested in these aspects may obtain a copy upon request.
Reply to comment #4. We did mention the patent on the algorithm employed in our article and cited it as reference 40. In view of the journal's requirements, this information is now also included in the Competing Interests section of the article. An EU-patent on this algorithm is still pending.
Reply to comment #5. The study is not underpowered as can be demonstrated using the G*power software (1-beta = .97 and 1-beta = .999).
We started with the assumption that we would obtain an effect in the range of d= .4 (small to medium effect). An a-priori analysis for a one-sided t-test for differences against a constant with alpha = .05 and 1-beta = .80 revealed a required sample size of 41, which we almost achieved. The findings we obtained for participants’ preferences resulted in an effect size of d = .564. A post-hoc analysis indicates that we achieved a power of 1-beta = .97, which is clearly sufficient. The same is true for the precision judgments made by participants, as we obtained an effect size of d= 1.13 and an achieved power of 1-beta = .999.