Reader Comments

Post a new comment on this article

Peer Review Comments and Response

Posted by foxcroft on 12 Oct 2012 at 12:06 GMT

I have copied here, for information, peer reviewer comments and our response. I hope this is useful or interesting.
David Foxcroft



Dear Editor

Thank you for asking us to revise our paper. Below we have copied the requested revisions / comments from the referees and added our response. Our response to each comment is framed by @@....@@@@.

All good wishes,
David Foxcroft PhD




PONE-D-12-04276
Personalised normative feedback for preventing alcohol misuse in university students: Solomon three-group randomised controlled trial
PLoS ONE

Dear Dr. Foxcroft,

Thank you for submitting your manuscript to PLoS ONE. We have now received two reviews of your manuscript and I have read and reread it myself. Consistent with the two reviewers, I agree that the manuscript has a potentially important contribution to the literature but that it should be reconsidered after major revision. In addition to the concerns and considerations that the reviewers raise, I most fundamentally would like to see addressed that the follow-up rates were only 50% and 40% at 6 and 12 months. I agree with one of the reviewers that these rates stand in sharp contrast to what others have obtained. If you can identify examples of studies obtaining findings that are more consistent with yours, I encourage you to do so, but the discrepancies that one of the reviewers identified also needs addressing.
@@
R1. We agree that the low follow-up rates are a potential issue. But as we wrote in our original submission, these rates are better than some published studies, but worse than others. The two European studies that we referenced had similar designs and poorer retention. The study by Ekman et al. had 32% retention of those randomised and assessed at baseline at 6 month follow-up (158/496). The study by Bewick et al. had 34% retention of those randomised and assessed at baseline at 24-week follow-up (374/1112). We have also identified a further unpublished trial with comparable retention rates. McCambridge et al. have presented initial results from a trial of a similar intervention in Sweden, reporting 45% retention (2336 of 5227 randomised) at two-month follow-up.

In the original manuscript we acknowledged that attrition was worse than some studies, referencing as examples the work of Kypri but also an American study by Lovecchio. As pointed out by referee 2, Kypri has consistently achieved higher follow-up rates. However, Kypri’s studies are different in focus and recruitment, in that they recruited students in face-to-face interviews in medical centres (as pointed out by referee 2) and only randomised those that scored above a screening threshold for alcohol use / problems. In Lovecchio’s study attrition was comparable to Kypri’s, but in this study the college alcohol programme was mandated for students. We contend that a more appropriate comparison of retention / attrition rates are the two published and one unpublished similar European studies, where the attrition rates are higher over shorter follow-up periods. Our study retention rates compare favourably with these other studies.

Moreover, given that in our study, and the studies by Ekman et al., Bewick et al. and McCambridge et al. students were asked to complete an electronic survey (albeit with randomisation to conditions) at different time points, then it is also worth comparing response rates with those typically found in surveys. Especially so given the minimalist/brief nature of the intervention. A recent Cochrane review by Edwards et al. included many meta-analyses of interventions to increase response rates in surveys. The two meta-analyses of most relevance to our study are analysis 4.2 (effectiveness of non-monetary incentives) and analysis 12.4 (effectiveness of shorter vs. longer questionnaires). In our study we provided non-monetary incentives to try and increase response rates, but we also had a fairly lengthy questionnaire because of the information we needed to capture in the study. The meta-analysis results from the Edwards et al. Cochrane review indicated that non-monetary incentives improved responsiveness (52% vs. 49% in studies meta-analysed) but longer questionnaires reduced responsiveness (18% vs 26% in studies meta-analysed). Our follow-up response rates, of 50% and 40% at 6 and 12 months, seem to be reasonable given Cochrane evidence of variable response rates in survey research.

NB we are not arguing that our response rates are unproblematic and are at low risk of bias. In fact we acknowledge that there is potentially a high risk of bias with regard to generalisability (although we agree with referee 2 that our analyses show no evidence of selective attrition). Rather, we are simply pointing out that our follow-up rates compare favourably, or are consistent with, other similar published research.

Finally, it is important not to lose sight of a fundamental point in this exchange about retention rates. In our study and the three other European studies, the application of a brief social normative feedback intervention has been evaluated in large, pragmatic, randomised controlled trials that would resemble a typical application / delivery of this intervention in a real world scenario in European University settings. That these studies found no substantive impact of the intervention, notwithstanding potential external validity issues with low follow-up rates, is an important finding in the field. Indeed, the lack of responsiveness may indicate shortcomings in a more general application of the intervention, which is important for policy makers to be aware of. Although Kypri’s recruitment to the intervention is somewhat different, it is also worth considering that Kypri’s results, from several studies, might not be transferable to other settings; Ioannidis’ paper on why most published research findings are false provides an important basis for this consideration.

As the Editor and one referee has requested further discussion on the follow-up rates obtained in this study, we have expanded the paragraph on attrition in the Discussion section in the revised paper as follows:

“Attrition in this study was lower than in other European research on email- or web-based social normative feedback interventions with university students [Ekman ref., Bewick ref., McCambridge ref.] though higher than in some Australian or U.S. studies [Kypri ref., Lovecchio ref.]. There is a potential risk to internal validity from the low follow-up rates achieved in our study, although students unavailable for follow-up were similar across groups with regard to sex, age and baseline drinking status, and the multiple imputation sensitivity analysis did not produce any marked or systematic changes in treatment effect sizes or significance. But, due to the low follow-up rates, we cannot absolutely rule out the possibility that differential attrition in relation to unmeasured and uncontrolled confounders could have affected the results.

A more serious issue is in relation to external validity, specifically generalizability [Fernandez-Hermida ref.]. It is hard to see how the sub-set of students that were retained in the study are generally representative of the whole student population, so inferences from sample to the pre-specified study population are problematic. In fact, the low-follow-up rates add to the problem with understanding how generalizable these study results are given the recruitment methods used: only those students who saw the study adverts (via email, student information systems or Facebook) were able to participate and, as the most effective recruitment strategy was Facebook with 78% of all participants, it is unclear how representative these participants are of the general student body. So, one possible explanation for our null results in this study is that we obtained results on a different group of students than other studies that have found significant effects.

On the other hand, this study was a large pragmatic randomised trial with design, sampling, recruitment and follow-up characteristics that are similar to other large European trials [Ekman ref., Bewick ref., McCambridge ref.]. All these other European trials have had low follow-up rates from those randomised and assessed at baseline, and with similar non-significant effects to our study. An important conclusion to draw out from our study, alongside these other European studies, is that the acceptability and viability of recruiting students into a brief personalised feedback intervention, outside of a university medical centre screening programme [Kypri ref] or a mandated student alcohol education programme [Lovecchio ref], seems to be low.


Bewick BM, West R, Gill J, O'May F, Mulhern B, Barkham M, et al. Providing web-based feedback and social norms information to reduce student alcohol intake: a multisite investigation. Journal of Medical Internet Research. 2010;12(5):e59.
Edwards PJ, Roberts I, Clarke MJ, DiGuiseppi C, Wentz R, Kwan I, Cooper R, Felix LM, Pratap S. Methods to increase response to postal and electronic questionnaires. Cochrane Database of Systematic Reviews 2009, Issue 3. Art. No.: MR000008. DOI: 10.1002/14651858.MR000008.pub4.
Ekman DS, Andersson A, Nilsen P, Stahlbrandt H, Johansson AL, Bendtsen P. Electronic screening and brief intervention for risky drinking in Swedish university students: randomized controlled trial. Addictive Behaviors. 2011;36(6):654-9.
Fernandez-Hermida, J. R., Calafat, A., Becoña, E., Tsertsvadze, A. and Foxcroft, D. R. (2012) Assessment of generalizability, applicability and predictability (GAP) for evaluating external validity in studies of universal family-based prevention of alcohol misuse in young people: systematic methodological review of randomized controlled trials. Addiction. doi: 10.1111/j.1360-0443.2012.03867.x [Epub ahead of print]
Ioannidis JPA. Why most published research findings are false. PLoS Medicine. 2005;2(8 (e124)):696-701.
McCambridge J, Bendtsen P, Bendtsen M, Karlsson N, Nilsen P (2011b) RCT of the effectiveness of electronic mail based alcohol intervention with university students: dismantling the assessment and feedback components. Paper presented to the Society for the Study of Addiction Annual Meeting, York 2011. URL:http://www.addiction-ssa.... Accessed: 2012-05-24. (Archived by WebCite® at http://www.webcitation.or...)
@@@@

You set out to conduct a randomized controlled trial of personalized normative feedback to correct misperceptions of college students about comparisons between their drinking and that of others. However, a preemptive result is that you are able to retain such a low proportion of participants that it becomes a bit risky to make substantive inferences about the overall efficacy of the intervention. It becomes more newsworthy and distinctive a contribution for your work to explain why these low rates were obtained. What "lessons learned" are there for future studies so that such difficulties can be anticipated and addressed ahead of time? Surely there are important lessons here and you are in an ideal situation to interpret them and document the basis for your interpretations. You don't have two at all be defensive about the low rates, but you should offer an adequate explanation of them.

So, I recommend that you preserve the write up as a randomized trial, but qualified conclusions based on the high attrition and discuss your results in a way that gives appropriate attention to these low rates of retention.
@@
R2. Please see above (R1) response
@@@@


Additionally, I encourage you to pay attention to the reviewers comments. Namely,

Reviewer 1

Please address this reviewer's comments about the need for further justification for a three group Solomon design, given its stringent assumption that there is a freedom from extraneous influences, and given the seeming advantage of a classic four group design.
@@
R3. In 2006, when we were planning the design for this study, and how to address the issue of test reactivity highlighted by Kypri et al. as a potential problem, it was suggested to us by a colleague at Leeds University (Prof. Andy Hill) that a Solomon 4-group design would be worth considering (such designs have recently been reviewed by McCambridge et al. (2011a) as a useful design for test reactivity, as pointed out by referee 2). However, it quickly became apparent that the conventional 4-group design was not feasible in our study, because one of the four arms was not possible: the arm where the intervention is given without any baseline assessment. This arm was not feasible because of the nature of the intervention, which requires a baseline assessment so that the intervention can be personalised to the current drinking levels of the participant. A further study (currently unpublished) evaluating brief personalised normative feedback is also using a similar three group design to assess any test-reactivity effects (McCambridge 2011b). We have added the following sentence in the Methods, Design section to clarify:

“A more conventional Solomon four group design was not possible given that the intervention required baseline information in order to personalise feedback to each participant.”


McCambridge J, Butor-Bhavsar K, Witton J, Elbourne D (2011a) Can Research Assessments Themselves Cause Bias in Behaviour Change Trials? A Systematic Review of Evidence from Solomon 4-Group Studies. PLoS ONE 6(10): e25223. doi:10.1371/journal.pone.0025223
McCambridge J, Bendtsen P, Bendtsen M, Karlsson N, Nilsen P (2011b) RCT of the effectiveness of electronic mail based alcohol intervention with university students: dismantling the assessment and feedback components. Paper presented to the Society for the Study of Addiction Annual Meeting, York 2011. URL:http://www.addiction-ssa.... Accessed: 2012-05-24. (Archived by WebCite® at http://www.webcitation.or...)

@@@@


Specify how allocation was done and whether there was a stratification or block size.

@@
R4. We have revised the “Randomization – sequence generation section” to make it clear that there was no stratification or blocking in the study. This section also specifies how allocation was done.
@@@

Indicate any role the University may have had and whether it's involvement might conceivably have affected follow-up.

@@
R5. Many UK Universities have a policy that students cannot be emailed directly with requests to participate in research. We initially approached five UK Universities and four agreed to participate. One agreed to email students directly, and the others allowed the posting of messages and web-links on student personal information systems. However, recruitment was poor using these approaches. Given this, we decided to extend our recruitment strategy by using Facebook. At the time, Facebook had a system which allowed targeted advertising to UK University students (since discontinued). In our study, 78% of the sample was recruited via Facebook. Clearly, our original recruitment strategy which relied on support from University administrators, was not successful. The alternative, Facebook, proved more successful but is not clear whether this would have influenced follow-up. However, we regard this as unlikely given similar or poorer follow-up rates in other studies where Facebook was not used for recruitment (Ekman et al., Bewick et al., McCambridge et al.) We have added a statement to the Discussion, Generalisability section to draw out the external validity limitations of the recruitment approach taken in the study, as follows:

“A more serious issue is in relation to external validity, specifically generalizability [Fernandez-Hermida ref]. It is hard to see how the sub-set of students that were retained in the study are generally representative of the whole student population, so inferences from sample to the pre-specified study population are problematic. In fact, the low-follow-up rates add to the problem with understanding how generalizable these study results are given the recruitment methods used: only those students who saw the study adverts (via email, student information systems or Facebook) were able to participate and, as the most effective recruitment strategy was Facebook with 78% of all participants, it is unclear how representative these participants are of the general student body. So, one possible explanation for our null results in this study is that we obtained results on a different group of students than other studies that have found significant effects. “


Fernandez-Hermida, J. R., Calafat, A., Becoña, E., Tsertsvadze, A. and Foxcroft, D. R. (2012) Assessment of generalizability, applicability and predictability (GAP) for evaluating external validity in studies of universal family-based prevention of alcohol misuse in young people: systematic methodological review of randomized controlled trials. Addiction. doi: 10.1111/j.1360-0443.2012.03867.x [Epub ahead of print]

@@@@

Explain why only one summary result of imputation is presented, when it is indicated that there was imputation for sensitivity analyses, which presumably should have produce multiple results for comparative consideration.

@@
R6. The referee is right, we generated 15 different imputations and we analysed them accordingly. However for simplicity we decided to only present the analysis of the average of the 15 different imputations in the results tables. The 15 imputed results were very similar and lead to the same inference with slightly different p-values.
@@@@



Reviewer 2

This reviewer provided exceptionally detailed comments and questions and I think all of them should be given attention in a revision, but please pay particular attention to the following:

I agree with reviewer 2 that some designation of one of the six outcomes as primary is necessary. Granted that this should have been done before analysis of the data, but even given that it was not, could you specify one outcome as primary and give an appropriate rationale?

@@
R7. We have specified that AUDIT score is the primary outcome, and provided a rationale based on work by Foxcroft, Kypri and Simonite (2009) which indicates that a small change in AUDIT score could have an important impact on population levels of alcohol disorders. The Outcome measures section has been revised as follows:

“Respondents completed the Alcohol Use Disorders Inventory Test (AUDIT) which is a 10-item scale with good validity that is designed to assess risky, dangerous and abusive drinking [Babor ref]. The AUDIT score was specified as the primary outcome variable. In another study we have shown that a small change in AUDIT score can have an important impact on population levels of alcohol disorders [Foxcroft ref].”


Foxcroft, D. R., Kypri, K. and Simonite, V. (2009), Bayes' Theorem to estimate population prevalence from Alcohol Use Disorders Identification Test (AUDIT) scores. Addiction, 104: 1132–1137. doi: 10.1111/j.1360-0443.2009.02574.x

@@@@



As noted above, I think the reviewer raised a quite important point that the attrition rate observed in the present trial is discrepant from at least four other trials. Some explanation should be offered, and if you could identify other trials more with in the same range as yours, that would be quite good also.
@@
R8. Please see response R1 above.
@@@@

Why was PNF provided a few weeks after baseline assessment? Are there any advantages or disadvantages, and could've conceivably influenced your retention rates?

@@
R9. We needed a few weeks so that all baseline data could be collected and analysed to provide a normative comparison dataset for personalised feedback to each participant. It is conceivable that this could have influenced retention rates, but we regard this as unlikely given more immediate feedback was provided in similar studies that had similar or poorer retention rates (Ekman et al., Bewick et al., McCambridge et al.) and similar findings of limited / no effectiveness. We have added a paragraph in the Discussion – Generalizability section to this effect:

“One possible reason for the low retention rates in our study is that the personalised feedback given in our study was delayed by a few weeks, as we needed to collect and analyse all baseline data to provide a normative comparison dataset for personalised feedback to each participant. It is conceivable that this could have influenced retention rates, but we regard this as unlikely given more immediate feedback was provided in similar studies that had similar or poorer retention rates [Ekman ref., Bewick ref., McCambridge ref.] and similar findings of limited / no effectiveness.”


@@@@

Why was the particular recruitment method chosen and what errors advantages or disadvantages over alternatives?

@@
R10. See response (R5) above. We acknowledge that recruiting students through face-to-face interviews in primary health care clinics, as per the Kypri studies, may provide a different context for participation and impact of the intervention. In our study it was not practical to recruit through primary medical centres as students do not routinely enrol at or are seen by a University or local medical centre. The next best option was to try and recruit students via email, personal information systems and social networking sites. This is the approach taken in other European studies (Ekman et al., Bewick et al., McCambridge et al.) and we argue that this provides a pragmatic context for the delivery of a personalised normative feedback intervention in European Universities.
@@@@


Can any explanation be offered for the capping of the outcome assessment at "seven or more" drinks?

@@
R11. Please see the response below (R25) to the more detailed comment from referee 2, below.
@@@@




We encourage you to submit your revision within sixty days of the date of this decision.

When your files are ready, please submit your revision by logging on tohttp://pone.edmgr.com/ and following the Submissions Needing Revision link. Do not submit a revised manuscript as a new submission.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

Please also include a rebuttal letter that responds to each point brought up by the academic editor and reviewer(s). This letter should be uploaded as a Response to Reviewers file.

In addition, please provide a marked-up copy of the changes made from the previous article file as a Manuscript with Tracked Changes file. This can be done using 'track changes' in programs such as MS Word and/or highlighting any changes in the new document.

If you choose not to submit a revision, please notify us.

Yours sincerely,

James Coyne
Academic Editor
PLoS ONE

Reviewers' comments:

Reviewer #1: This trial uses a three group Solomon design, which is intriguing given the points made by Ross and Smith (Amer Sociol Rev, 1965), where it is pointed out that the design is meant really for laboratory studies free from extraneous influences, which are assumed missing. I really think this assumption, as in the case of the field studies on the American Soldier, cannot be justified. The authors need to justify why the classical four group design was not used, given the wide range of possible external influences and interactions in this setting.

@@
R12. Please see response above (R3) for an explanation of why a classical four group was not possible in this study. Thanks for pointing us to the Ross and Smith paper, which we were not previously aware of. The Ross and Smith paper critiques all experimental designs outside the laboratory, whether they are two, three, or four group designs, because of the potential for interaction between the pre-test, the intervention (or both) and uncontrolled events which are more likely in uncontrolled non-laboratory (i.e. real-world) settings. Ross and Smith (p.79) correctly state: “Undercontrolled events (U), or any one of the three interactions involving U, may be outrageously complex. U might represent the upshot of all main effects and interactions produced by personal aging, community strife, national events, and family chit chat”. This is a challenge for many studies undertaken in real-world settings, and researchers, as Ross and Smith point out, must make assumptions regarding the impact of uncontrolled events. Even with randomisation to distribute potential confounders between groups, including exposure to uncontrolled events, any interaction between testing, intervention (or both) and these controlled events could create a new set of confounders that lead to a misinterpretation of effects. In our study we have assumed that there is no interaction between testing, intervention (or both) and uncontrolled events, and acknowledge that this assumption may not be true. Nevertheless, we would argue that our study should be considered as a pragmatic evaluation of effectiveness, where real-world extraneous influences are welcomed, rather than an assessment of efficacy where such extraneous influences (and the interactions they may produce) are regarded as nuisance factors.
@@@@

Additionally the methodology here is opaque: table 3 and 4 presumably show the effect of the intervention on 7 outcome measures. Which one is the primary outcome? And why is the effect of intervention tested at baseline? If randomisation is in place this will be due to random error, and in any case the intervention has not started. So the correct analysis is to adjust for baseline (regression to the mean).

@@
R13.See response above (R7) regarding the specification of the primary outcome. In Tables 3 and 4 the effect of the intervention is not tested at baseline. Rather the difference in outcomes is reported at all three time points. This gives an indication of any baseline imbalances between intervention and control groups, and as such is an indication of effective randomisation. We argue that this is informative and would prefer to leave the information in the Table.
@@@@

We do not know how allocation was performed - any stratification, block size?

@@
R14.Simple randomisation was employed, without any stratification or blocked allocation. See response (R4) above.
@@@@

The power calculation gives us alpha and beta but no effect size.

@@ R15. The effect size used in the power calculation was based on Swedish University study by Johnsson and Berglund, with 5% Type I and 10% Type II error rates. In this Swedish study the smallest effect size was for females, with a mean difference in AUDIT score of 1.9 (from 9.7 to 7.8). With 90% power and 5% alpha then a sample size calculation showed that to detect this effect size (conservatively assuming a standard deviation of 6 in intervention and control conditions) would require 105 students per condition for pairwise comparisons. (NB as the female group had the lowest effect size in the Swedish study, n=105 for female comparisons will have sufficient power, and as the effect size for males (a reduction in mean AUDIT score of 3.1) was much bigger, then n=105 for male comparisons will have more than enough statistical power). Therefore we aimed to achieve a sample for analysis of 105 females per condition, knowing that 105 males per condition was more than enough. We assumed attrition would be 30% over 1 year, so 105 / 0.7 = 150 hazardous drinking students per gender per condition would need to be recruited.

We have revised the text on sample size as follows:

“We needed an achieved sample size of 900 hazardous drinkers (150 per gender per group), based on effect size estimates of a mean difference in AUDIT score of 1.9 (s.d. = 6) for females [Johnsson and Berglund ref], and power =.9 and α = .05 (2-tailed tests). Taking account of known prevalence rates for hazardous drinkers in this population, and with cautious estimates for participation and attrition rates, we aimed to recruit 4000 students. “


Johnsson, K. O. and Berglund, M. (2006) Comparison between a cognitive behavioural alcohol programme and post-mailed minimal intervention in high-risk drinking university freshmen: results from a randomized controlled trial. Alcohol and Alcoholism 41, 174–180.

@@@@

What role does university play in this - as randomisation is by the individual presumably it is included in a multilevel model. Is follow-up affected by university?
@@
R16. We assumed that the students from the same university who share the same environment and situation might have correlated observations. Hence a mixed model (multilevel model) has been used to analyse the data. The same model has been employed for follow-up. The results show that the university effect is present but is not very significant.
@@@@


Multiple imputation generally implies multiple methods of imputation for sensitivity analyses - hence presumably multiple results. Please explain why only one is given.
@@
R17. Please see Response 6 (R6) above.
@@@@


Fundamentally this paper needs to be self-contained and sending people off to the protocol for basic design features is not good practice. At present the paper is not a pleasant read - it is largely disjointed with a number of grammatical errors (e.g. past tense of fit is fitted)

@@
R18.We have removed sign-posting to the protocol for basic design features as requested, so that now the paper is self-contained. The structure of the paper is prescribed by the journal guidance and CONSORT requirements so there is little room for manoeuvre, though we agree that this does not help with readability. We have carefully proof-read and corrected any grammatical errors found.
@@@@

Reviewer #2: Review 2 April 2012

Personalised normative feedback for preventing alcohol misuse in university students: Solomon, three-group, randomised controlled trial
Manuscript# - PONE-D-12-04276

I have previously reviewed this paper for another journal and recommended it be accepted subject to revision. In the interest of fairness to the authors I encourage you to include this disclosure to them. Some of my comments below are duplicated from the previous review. There may be good reasons that the authors did not address them in their revision of the paper (e.g., they may have disagreed with them) but it is not clear to me why and I still consider them to be points worthy of their attention. I have added some new comments (marked *) reflecting my reading of the manuscript submitted to PLoS ONE.

The paper reports a RCT of personalised feedback as an intervention to reduce alcohol consumption among college students in the UK.

The rationale for the trial is sound and the design, involving control for possible assessment effects (Solomon 3-group) is innovative and appropriate. Attrition is high but analyses appear to show that it is unlikely to have biased effect estimates. The reporting of the trial is generally clear, though I have some questions about the analysis. The paper could make a valuable contribution on an important subject.

1. Was any of the six specified outcomes designated as primary?
@@
R19.The sample size calculation for the study was based on a mean change (difference) in AUDIT score for hazardous drinkers, so we have designated this as the primary outcome (as requested by the Editor) in the revised paper. The other drinking behaviour outcomes are also relevant to policy dependent on the context for considering the intervention in question. Many studies of drinking behaviour report a range of drinking behaviour measures, and one would expect an effective intervention to impact on most, if not all, measures. We have reported all measures from the study in the paper to avoid any selective reporting bias.
@@@@

2. It is not the case that the attrition rate (50-60%) is similar to those in other trials. There are at least four trials (two of which are cited: #37 and #38) in which attrition was <35%. In a third trial, not reported in the manuscript (Kypri, K., Langley, J. D., Saunders, J. B., Cashell-Smith, M. L. and Herbison, P. (2008) Randomized controlled trial of web-based alcohol screening and brief intervention in primary care. Arch Intern Med 168, 530-6.), the attrition rate was 10% at 12 months.

@@
R20. See response (R1) above.
@@@@


3. Methods. Presenting PNF a few weeks after baseline assessment is unusual in feedback studies. This is worthy of comment in the discussion. Why was this approach used and how might it have affected the efficacy of the intervention?

@@
R21. See response (R9) above.
@@@@


4. The recruitment method (via social networking and bulletin boards) is also not standard practice. In the three studies referred to in point 2, above, recruitment occurred face-to-face in the Student Health clinic or via e-mail. In the approaches cited in references 16 and 30, recruitment is face-to-face. In the studies by Neighbors et al, recruitment occurred via e-mail. It would be worth noting the difference in the approach used and considering whether it might partly explain differences in findings. For example, were the participants in this trial more self-selected than in studies in which everyone in a health clinic or on the student enrolment list is invited to participate. How might the selection affect response to the intervention?

@@
R22. Thanks for this useful suggestion. It is a good point, and in the revised paper we have extended the Discussion - Generalisability section to cover this issue, as follows (see also response (R5) above):

“A more serious issue is in relation to external validity, specifically generalizability [Fernandez-Hermida ref]. It is hard to see how the sub-set of students that were retained in the study are generally representative of the whole student population, so inferences from sample to the pre-specified study population are problematic. In fact, the low-follow-up rates add to the problem with understanding how generalizable these study results are given the recruitment methods used: only those students who saw the study adverts (via email, student information systems or Facebook) were able to participate and, as the most effective recruitment strategy was Facebook with 78% of all participants, it is unclear how representative these participants are of the general student body. So, one possible explanation for our null results in this study is that we obtained results on a different group of students than other studies that have found significant effects.

On the other hand, this study was a large pragmatic randomised trial with design, sampling, recruitment and follow-up characteristics that are similar to other large European trials [Ekman ref., Bewick ref., McCambridge ref.]. All these other European trials have had low follow-up rates from those randomised and assessed at baseline, and with similar non-significant effects to our study. An important conclusion to draw out from our study, alongside these other European studies, is that the acceptability and viability of recruiting students into a brief personalised feedback intervention, outside of a university medical centre screening programme [Kypri ref] or a mandated student alcohol education programme [Lovecchio ref.], seems to be low.”

Please note that it is beyond the scope of this paper to provide a more systematic assessment of factors that may moderate effectiveness across studies. We are currently updating our Cochrane review on social normative feedback interventions and will be considering variability in effectiveness more systematically in that work.

@@@@


5. P7 Comment on non-blinding of post-grad researcher to allocation. It would have been possible to blind (as in many of the studies referred to above). Perhaps the authors could comment on the likely effect of this non-blinding.

@@
R23.In this study it was not feasible to blind the post-grad researcher to the allocation. Given the automated IT based randomisation, allocation and administration of questionnaires then it was not possible for the researcher to have influenced these aspects of the study. The one area where the researcher could have exerted an influence because she was not blinded was in sending out the personalised feedback via email to participants in the intervention condition. This could not be automated because of technical and resource constraints. However, we regard the risk of any performance bias unlikely because any such bias typically serves to inflate effect sizes and in this study no significant effects were found.
@@@@

6. P4 Intervention. Were the medical guidelines presented with direct comparison to the participant's drinking (e.g., in a bar chart) as in previous studies. If this was merely text, is it possible that the participants failed to notice them? On that note, it would be helpful for readers to be able to see the instrument - can a URL be provided in the paper?

@@
R24. We have archived a copy of the online questionnaire and an example of the personalised feedback provided and referenced these in the revised paper. We used a mixture of graphical and text based feedback. It is possible that the participants did not notice or pay attention to the feedback, and this is possibly an area for further work. We have alluded to this in the Discussion – Generalizability section, with the following statement:

“Alternatively, the intervention we used, based on studies from other countries, may not have been sufficiently developed for the UK population.”


@@@@

7. Outcome measurement. Given how much students drink per occasion, I was surprised that the usual consumption item topped out at "7 or more". Even the AUDIT item 2 has a higher maximum value (10 or more) and concerns have been expressed that it can produce a ceiling effect. In the web format, some researchers have made it continuous up to 30 drinks, with more than 30 as the top option, to avoid giving respondents the impression that the researchers think more than 7 is a lot, and to permit greater sensitivity in detecting change.

@@
R25.We were concerned to present outcomes that were independent of each other, i.e. that outcomes did not rely on other items for part of their score. So, although we collected data on AUDIT item 2 with a higher maximum value (10 or more) as indicated by the referee, because we reported an analysis of AUDIT scores separately we chose to present results on usual consumption using a different questionnaire item. This item asked how many drinks participants usually consumed on social occasions or at parties, with a range of 0 to 7+. In hindsight, we agree it would probably have been better to have more responses available at the higher end, to avoid a ceiling effect. It is possible that a ceiling effect could have limited the sensitivity to detect change, but we think this unlikely given the consistency in findings of no effect across the range of outcome measures.

The closest measure we have to the one suggested by the referee is overall weekly consumption, which represents quantity consumed in a week (rather than per occasion, but it is closely related in a student population) and avoids a ceiling effect. In an initial analysis of this variable undertaken but not included in the submitted paper (and without controlling for intra-class correlations at the University level as in analyses in the submitted paper), the following result was obtained: the geometric marginal mean score for weekly alcohol consumption at 12 months in the intervention group was 2.80 (95% CI 2.69 to 2.94), compared with 2.94 (95% CI 2.80 to 3.09) in the control group. With age and gender as fixed factors, and baseline weekly alcohol consumption and social desirability as covariates, there was not a statistically significant group effect (F = 1.87, p=0.17). In the submitted paper we analysed this variable according to risky drinking status, in other words whether individuals were drinking over recommended guidelines.

@@@@


8. *P5 The first two outcomes (drinking frequency and typical occasion quantity) are commonly multiplied to produce a volume per week or month measure. In this trial the authors use a different type of question to measure weekly drinking. I wonder why they did this, particularly given the finding of a large treatment effect on this measure alone. I'm curious to know whether the effect is present if the usual F*Q approach (as described above) is used to compute the weekly drinking variable.

@@
R26. The measure obtained by multiplication suggested by the referee was not one of our pre-specified outcomes. We did measure drinking per week using a question about consumption in a typical week, and scored as units / week. In a preliminary analysis, without adjusting for multi-level effects, the volume consumed per week was not significantly impacted by the intervention compared with controls: the geometric marginal mean score for weekly alcohol consumption at 12 months in the intervention group was 2.80 (95% CI 2.69 to 2.94), compared with 2.94 (95% CI 2.80 to 3.09) in the control group. With age and gender as fixed factors, and baseline weekly alcohol consumption and social desirability as covariates, there was not a statistically significant group effect (F = 1.87, df = 1,610, p=0.17). In our multi-level analysis, we have included this variable but dichotomised into weekly drinking levels (above or below recommended limits).
@@@@


9. *P6 Reference is made to use of AUDIT to assess eligibility. Given the Solomon design it should be noted that effects have been shown for exposure to the AUDIT alone: McCambridge, J., & Day, M. (2008). Randomized controlled trial of the effects of completing the Alcohol Use Disorders Identification Test questionnaire on self-reported hazardous drinking. Addiction, 103(2), 241-248.

The exposure of all participants to this instrument at baseline should be discussed as a possible explanation for the null findings.


@@
R27. We assessed the possibility of test reactivity (or effect due to AUDIT exposure alone) in the Solomon three group design. As we stated in the original submission (p9-10), there was no effect of exposure to baseline assessment (including AUDIT):

“We have not included the delayed control group in attrition assessment or statistical analysis of effects because there were no statistical differences between the main control and the delayed control groups at 12-month follow-up for any drinking behaviour measures (frequency of drinking, χ2=6.29, df=6, p=0.39; usual quantity of alcohol, t=0.075, df=699, p=0.94; AUDIT, t=0.63, df=699, p=0.53; alcohol-related problems, t=-0.181, df=699, p=0.86; perceived drinking norms, t=-0.609, df=699, p=0.54). This analysis indicates that there was no effect on drinking behaviour of the baseline measures and questions about alcohol use and problems.”

We therefore think it unlikely that exposure to AUDIT alone accounted for the null findings. Such an explanation would require us to disregard our findings of no difference between the normal control (with baseline AUDIT) and delayed control (no baseline AUDIT). In fact, our findings suggest that the intervention was ineffective regardless of whether effects are due to the assessment, to the personalised feedback provided, or to both.

@@@@


10. P6 A typo - the I in AUDIT stands for "Identification".

@@
R28. Corrected.
@@@@

11. P6 AUDIT is described as "designed to assess risky, dangerous and abusive drinking". The manual (Babor et al 1992) describes AUDIT in terms of identifying individuals with "hazardous and harmful drinking".

@@
R29. Corrected.
@@@@

12. P6 The alcohol problems score reflects a checklist of items of widely varying frequency and severity (e.g., been embarrassed vs been in a fight). What are the psychometric properties of this measure?

@@
R30. The brief alcohol problems measure was developed specifically for this study, but informed by other alcohol problems scales and we do not have any information from other studies about the psychometric properties of the measure. In our study the internal reliability of this scale was calculated as alpha = 0.71. If the item about embarrassment was excluded then alpha dropped to 0.66. Factor analysis suggests a single factor solution is appropriate. We are currently collecting more data on this scale in other studies and will report on psychometric properties in due course. We have revised the information on this scale as follows:

“Young people who engage in one problem behaviour ( e.g. alcohol misuse) are also more likely to engage in other problem behaviours [Jessor ref]. Therefore, other problems were measured in a newly-developed self-reported scale with nine possible problems, listed on a yes/no scale: 1. Blackout or memory lapse; 2. Been embarrassed by your actions; 3. Been in a fight; 4. Engaged in unprotected sex; 5. Missed a lecture/class; 6. Required emergency medical treatment; 7. Sustained an injury; 8. Trouble with local or campus authorities; 9. Received unwanted sexual advances. Responses were summed to provide an alcohol-related problems score. The internal consistency for this scale was alpha = 0.71”


@@@@


13. P7 Analysis. Why were standard errors inflated due to clustering given that this was an individually randomised trial and not a cluster randomised trial?

@@
R31. This was an individually randomized trial but participants were drawn from a range of clusters (Universities) with the possibility that students from the same university who share the same environment and situation might have correlated observations. Please also see response (R16) above.
@@@@

14. P7 Sensitivity analysis. It is unclear from the description what assumptions guided the multiple imputation for missing values. White et al provide good advice on this: White, I. R., Horton, N. J., Carpenter, J. and Pocock, S. J. (2011) Strategy for intention to treat analysis in randomised trials with missing outcome data. BMJ 342, d40.

@@
R32 The following assumptions based on the White et al advice on Strategy for intention to treat analysis guided our multiple imputation:

1. all randomised participants were followed-up even if they did not participate in the intervention (i.e. they withdrew from the allocated treatment); 2. We performed a main analysis of all observed data that are valid under a plausible assumption about the missing data; and 3. We performed sensitivity analyses to explore the effect of departures from the assumption made in the main analysis.

The above information has been added to the “Statistical Methods” section in the revised paper.

@@@@


15. *P8 The references (29,30) for the random effects models appear to be incorrect.

@@
R33. Our apologies. These references were left in from our initial drafting and should not have been included. They have now been removed.
@@@@

16. Results. The lack of assessment effect is noteworthy given a recent systematic review on this: McCambridge, J. and Kypri, K. (2011) Can simply answering research questions change behaviour? Systematic review and meta analyses of brief alcohol intervention trials. PLoS ONE 6, e23748.

@@
R34. We agree, and have added this reference to the Introduction section of the paper, and have noted the lack of assessment effect in the Results – Outcomes and estimation section (see response (R27) above).
@@@@

17. Some of the effect estimates (e.g., 0.71 for weekly drinking at 12 months) are large, compared with what has been shown previously, but non-significant. The confidence intervals are very wide (e.g., 95% CI 0.43 to 1.16) and I wonder if the inflation for clustering (point 13, above) which I suspect is inappropriate, might contribute to that, and if the study was under-powered.

@@
R35. The effect estimate for weekly drinking is potentially important though non-significant. However, our primary outcome variable and other variables did not show a consistent pattern iof effects regardless of significance level, and it is this lack of a consistent pattern of effect across all our outcome measures which, we suggest, is the more important finding. We have already commented on this in the Outcomes and estimation section in the paper.
@@@@

18. P9 Discussion. Several studies discussed on p11 are described as "brief, web-based, social normative feedback" but they are more than that. They, like many of the interventions examined in this literature, are complex interventions, including tailored criterion feedback, tailored bio-feedback (on blood alcohol level), financial feedback, health information, and advice on where help can be obtained. This study is described as a trial of PNF but in fact it too, is more than that, including elements other than normative feedback. Clues as to the differences in effect shown here versus other trials might lie in the differences in intervention content and delivery.

@@
R36. Thank you for this important point. We have added a paragraph to the Discussion – Generalisability section to reflect this point, as follows:

“We should also point out that, like many other studies that have reported the effects of brief personalised normative feedback, the intervention contained more than just the normative feedback. We also provided financial feedback, health information and advice on where help can be obtained. The interaction between these different components, and how they are presented to participants, may be important in determining effectiveness. Clues as to the differences in effect across different studies might lie in the differences in intervention content and delivery.”

Please note that it is beyond the scope of this paper to provide a more systematic assessment of factors that may moderate effectiveness across studies. We are currently updating our Cochrane review on social normative feedback interventions and will be considering variability in effectiveness more systematically in that work.

@@@@

19. *It would be helpful (and it is recommended in CONSORT) to present summary statistics for the outcomes at the two follow-up timepoints, and not simply the effect estimates.

@@
R37. Agreed. We have added two new Tables in the revised paper (Tables 3a and 3b) to present these summary statistics, and have also added further information to the measures section to aid understanding / interpretation of the Tables.
@@@@


20. *Looking at Table 3, it is unclear what the effect estimates at Baseline refer to. CONSORT recommends against testing for differences at baseline.

@@
R38. We have now removed the baseline effect estimates from Tables 3b and 4b.
@@@@


21. *The funding source was listed in the previous submission but it is not listed here and in my view it should be for all scientific papers. I don't recall PLoS ONE's policy on this.

@@
R39. We followed the PLoS ONE guidance, which states: “Funding sources should not be included in the acknowledgments, or anywhere in the manuscript file. You will provide this information during the manuscript submission process.”
@@@@

No competing interests declared.