Reader Comments

Post a new comment on this article

Fundamental misconceptions of safety and of statistics

Posted by MKary on 01 Dec 2013 at 22:43 GMT

This re-analysis of the Walker data has two main positive aspects. First, it confirms that Walker did his own analysis correctly, when he found a statistically significant decrease in passing distance for the helmet wearing condition. Second, it emphasizes the importance of discriminating between differences that are statistically discernible, and differences that make a relevant difference (statistical significance versus health or other significance). The latter positive aspect is marred by a deep misunderstanding of these ideas, not just as they apply to bicycling, but even in their purely statistical aspect.

1.
To begin with the latter. Olivier and Walter state that "Whether [the] power level was calculated post hoc or a priori, this is an overpowered study far above the usual convention of 80% power being adequate to detect an effect of any size [27] which increases the risk of type I errors, i.e., the detection of a statistically significant difference in the sample when there is no true difference in the population [28]." (Citation numbers as in the original.)

This statement is nonsensical. First, it is false that any superabundance of statistical power coming from increasing the sample size-- which is what Walker did, and what Olivier and Walter are objecting to-- increases the risk of type I errors. The risk of a Type I error is exactly what is specified by the p-value, and the largest p-value reported by Walker, related to helmet wearing, and described by him as significant, was p = 0.007. This is an exceedingly small risk of a Type I error by any conventional standard, and thus there is nothing for Olivier and Walter to complain about or investigate, rather the contrary.

Moreover, the reference cited [1] to support that false claim does no such thing. Instead, it asserts only and correctly that such a superabundance of power is uneconomical.

Reference [1] does explain that, of course, for a fixed sample size, power can be increased by using a less stringent criterion for significance. But that is not what Walker did, it is not what Olivier and Walter are objecting to, and the remedy they propose is not to use a more stringent criterion for significance, but to reduce the sample size post hoc via their resampling scheme.

The authors' misunderstanding of these basic statistical concepts is key to the strategy of their re-analysis, and is stated as such in their study objectives: "to assess the extent to which the [large] sample size in the original analysis may have contributed to spurious results." In fact, large sample sizes never contribute to spurious results. They can only confuse those who do not understand the difference between statistical and other types of significance. Olivier and Walter have confused the finding of a difference that putatively does not make a difference, with a Type I error. In fact, such a finding is not in itself a statistical error of any kind, nor, as a finding, any kind of error. The correct procedure is not to devise some way of worsening significance levels post-hoc, as Olivier and Walter do, but simply to argue on a scientific, not statistical, basis that indeed the difference does not make a difference.

This misstatement and misunderstanding is fundamental to Olivier and Walter's re-analysis, governing much of the procedures used. While the authors begin by correctly distinguishing between statistical significance and health significance, in later stages they blur the two together, such that the worsened levels of statistical significance achieved by their post-hoc strategy are then taken as evidence against the validity of the idea that small differences might indeed matter.

The authors' statement is further misleading, because it omits that Walker did explain why he wanted such a large sample size, and why he was interested in small effect sizes. First, Walker understood that there were many and stronger factors that affect passing distance (e.g. lane width, traffic volume, traffic speed, road condition) that were impractical for him to deal with. Power calculations do not address unmeasured confounders; instead, based on his understanding of the routes he took and his sampling procedure, Walker expected that a large sample size would blur them into the background. Second, one reason for his interest in effects of small sizes is that they might reveal some psychological mechanism itself of interest to psychology (a safety reason is covered further on). Thus in the context of their criticisms, it is misleading for Olivier and Walter to have bypassed the explanation. Moreover, Olivier and Walter's statement misleads again because it spuriously raises doubts if Walker was upfront over whether he conducted his power analysis a priori or post hoc. In fact, Walker clearly stated it was done a priori.

2. Identifying consequential differences.
Next, the question of differences that exist not due to chance versus differences that make a difference. The authors invoke the concept of minimally important difference (MID), "the smallest effect size that makes an impact on patient outcomes". The problem with applying this concept in this context is that neither the original study nor the re-analysis include any data on patient outcomes. (Save for one of the two instances (helmeted) where Walker was actually hit. Note that Olivier and Walter's Figure 1 shows no data of a zero or negative overtaking distance. Walker deleted these from the data set for their proximity figures being unreliable, but it would seem better to have coded them as zeroes. The various calculations are correspondingly affected.)

Thus the MID concept invoked by Olivier and Walter is of no use in this context. Instead, the data concern a variable hypothesized to be related to clinical (or other) outcomes, although in an as yet undetermined way.

Consider three ways a "motor vehicle passes bicycle" manoeuvre can affect the outcome, for bicyclist, motorist, or other person or property in the vicinity:

(1) The motor vehicle contacts the bicycle or bicyclist. This may or may not lead to injury: thus Walker, though struck twice, sustained (minor) injury only once.
(2) The pass generates winds that topple, misdirect, or otherwise harm the bicyclist or their property.
(3) The pass startles, frightens or otherwise concerns the bicyclist, either to the point of causing injury or property damage, or to intimidate them from further pursuit of the activity, or to reduce the desirability thereof.

Now suppose for the moment the pass is not fast enough for wind forces to be a factor, and ignore for the moment any psychological effects.

Literally speaking, in this context any above-zero passing distance is "safe", because by definition there is contact only when the passing distance is zero or less. From another point of view, no practical passing distance is safe, because some other event may impinge upon the process, and divert one vehicle into the path of the other, or otherwise result in a collision. For example: the motor vehicle may pass slowly at 1 m, then suddenly stop and disgorge an occupant on the passenger side, provoking either a collision with the cyclist, or a swerve leading to toppling. Or, an oncoming vehicle may strike the passing vehicle, and divert it even a full lane width or more into the path of the cyclist. Or, driver or rider may suffer a health event, such as a stroke, which leads them to a large excursion from their normal path. Or, the cyclist may swerve to get around an obstacle, such as a pothole or broken glass. Or, simple inattention or distraction may lead either driver or rider to a large excursion after the initial set-up, for which see further on.

Some of these events are improbable but then so too are actual collisions. As the authors correctly point out in the comments, rare events in themselves can be unanalyzable by statistical means. That is why some five decades ago, engineers developed the technique of conflict analysis, of which passing distance analysis is one of a number of subsequent variations on the theme: it studies common events or factors that do not produce injury or property damage, but whose characteristics are supposed to indicate either tendencies or potential to do so.

With regard to collisions between motor vehicles there is a body of work that attempts to understand the complex relations between such indicators and actual collisions. There is no comparable body of work for motor vehicles passing bicyclists.

Olivier and Walter's re-analysis depends on dichotomizing passing distances a priori between "safe" and "unsafe", or "close" and "far" (usages they inappropriately mix), but apart from idealized, minimally contextual calculations of the direct effects of aerodynamic forces, there is no scientific basis for any such dichotomization, neither for the idea that there is a dichotomy-- an idea that goes against the grain of the above-mentioned body of work-- nor for any particular choice of boundary.

3. Inappropriate use of citations.
Instead, Olivier and Walter simply cite a number of non-peer reviewed sources to support the idea that "safe" passes are those at greater than 1 m, and "unsafe" ones less. These non-peer reviewed sources consist of advocacy documents, the existence of some legislation-- and oddly, a 2010 letter from the Queensland Minister of Transport opposing the 1 m rule, on the grounds that safe passing is contextual-- thus actually in opposition to Olivier and Walter's central claim that there is an a priori dichotomy between "safe" and "unsafe" passing. The Minister even asserts that 1 m is dangerously close at higher speed. Another citation, though in a journal that does include peer-reviewed work, is instead a contributed article by the CEO of an advocacy group, to present their activities [2].

Olivier and Walter also cite two peer-reviewed works [3, 4], in the context of aerodynamic side forces: "These authors have given minimum overtaking distance recommendations of three feet (91.44 cm) and 1.5 m for heavy vehicles passing at 64 km/h and 100 km/h respectively."

In fact, reference [3] does not even address the matter, but concerns only whether Baltimore's 3-foot (0.91 m) law is being obeyed. Its only connection to the topic is that it cites [4], for a figure of a 13 N side force on a bicycle passed at 3 ft by a heavy vehicle at 64 km/hr. Reference [3] does not state that this is a safe combination, rather it suggests that it is unsafe: it says that this force in particular "may divert that cyclist from his or her course, increasing risk for a collision with traffic or parked vehicles."

Although reference [3] cites [4] for that 13 N figure, no such figure actually appears in that work. In fact, reference [4] does not deal with that or any similar speed, and the only mention it makes of the 3 ft or 0.91 m distance is that it is the smallest width of designated bikeways marked on highway shoulders that they found, via inquiry into then-current practice. Thus Olivier and Walter merely repeat the same citation error as reference [3].

They then add their own further citation errors. Reference [3] does not, as Olivier and Walter claim, make any recommendation for speeds of 100 km/hr or any similar speed. Meanwhile, reference [4] does not calculate any wind forces for any speed or any separation. Instead, it just reproduces a bar graph of side wind forces for heavy vehicles passing bicycles at speeds of 100 or 115 km/hr, for distances of 1, 2, or 3 m. That bar graph was reproduced from a work published by Velo Quebec, an advocacy group. Reference [4] describes Velo Quebec's source for this graph as some further study, unnamed and un-cited.

As to Olivier and Walter's claim that [4] recommends a separation distance of 1.5 m for heavy vehicles passing at 100 km/hr, again they do not get it right. Reference [4] in fact recommends a marked bicycle lane width of 1.5 m, within a highway shoulder of 3 m, with the first 0.5 m consisting of a rumble strip within the bicycle lane, for well-designed (standard lane width, proper sight lines, and so on) low-trafficked low-speed highways. This is not the situation studied by Walker, who appears to have ridden in city roadways adjacent to a curb, not in a designated bikeway on a highway shoulder with no curb, much less with a rumble strip.

Thus the works cited by Olivier and Walker to justify their position are either non-peer reviewed advocacy documents, irrelevant, or in opposition to the idea of an a priori blanket dichotomy for the Walker data, which do not include overtaking vehicle speeds, traffic density, lane widths, sight lines, and so on. The later sensitivity analysis conducted by Olivier and Walter is of little remedy, because it still relies on the idea of an applicable a priori blanket dichotomy somewhere. The fact is that for any passing distance recorded by Walker, whether at 0.5 m or nearly 4 m or anywhere in between, we do not know whether it represents a safe or precarious condition, because we do not know the passing vehicle's speed, the condition of the road, or various other circumstances. 0.5 m is not necessarily problematic on a well-paved road at the crawling speed of congested traffic; but 4 m may be very problematic, because it suggests a complete lane change, which may very well have been done without checking, or across a centre line. Bicyclists may appreciate the latter when done by a courteous and aware motorist, but not when done on blind curves and hills by flustered or baffled ones.

As for legislation, Olivier and Walter neglect that elsewhere the legal requirement is 4 ft (1.2 m) combined with "prudent reduced speed"; that other states also specify that speed must be reduced while passing, in some to low levels; or that in New Hampshire the legal requirement is 3 ft at 30 mph, and 1 ft for every 10 mph faster, and so 1.83 m, not 1.5 m, at 60 mi/hr (96 km/hr) [5].

Olivier and Walter have confused a political compromise leading to a minimum legal requirement, with a meaningful safety boundary. But Olivier and Walter should already know that, since one of the advocacy documents they cite [6] tells them so: "Why not ask for 1.5m? It's simple really. Road infrastructure here in Brisbane is unable to comfortably  provide cyclists with such a buffer zone without inconveniencing drivers. Until driver attitude's change, we will have a difficult time asking for more than a metre without tension increasing on our roads, and the other major cities are in a similar position."

Indeed, the 3 ft/1 m rule, first established by law in Wisconsin in 1973, comes from the facts that (a) a yard or a metre is a familiar distance; and (b) typical lane and vehicle widths and bicycle spacing from curb or seam require the motor vehicle operator to encroach into the next lane at any greater distance-- as noted by Walker.

Based on their misunderstanding of the safety implications, Olivier and Walter assert, with regard to Walker's use of data-driven quartiles, that "each quartile in the data was well above one metre (range: 1.17­-1.47 m). Hence, Walker's analysis did not consider distances of practical importance in the categorisation of passing distance."

Have Olivier or Walter ever ridden a bicycle and been passed by a motor vehicle at any interesting speed, especially a truck or bus, at a distance of 1 m? They might try it, and consider whether the experience still seems to them a safe branch of some dichotomy. Of course the vast majority of passes recorded by Walker were above 1 m: many motor vehicle operators are themselves alarmed by such a spacing.

4. Inapplicable caveats.
Olivier and Walter criticize Walker's use of data-driven quartiles to distinguish between "near" and "far", saying "There are caveats associated with the use of data-driven quartiles [25]" (citation number as in the original).

But what are the caveats they cite? Never mind that the citation given [7] is of a small simulation (20 runs of 100 patients) of diagnostic test validity, which is not what is under investigation here. Consider instead that the author's conclusions [7] are that the bias that may be introduced is less in studies with bigger samples (but n = 2355 here), and that "The post hoc derivation of a diagnostic threshold can introduce a small bias into diagnostic test validity studies if the number of cases is smaller than about 50. [...] It could be avoided by the use of cut points derived from previous work if good quality prior studies are available."

Olivier and Walter would have us believe that the "good quality prior studies" are the political compromises of advocacy groups.

5. The consequences of small differences.
This returns us to the question of how non-zero passing distances relate to safety, and what to make of a small but non-zero effect size for helmet wearing (or anything else). Unlike Olivier and Walter, Walker recognized that his dependent variable was not a patient outcome. Instead, he hypothesized that even a small effect size for it, such as the ones found by both Walker, and Olivier and Walter, could result in an accumulation of health (or, one should add, property damage) outcomes, given the large number of times such events are repeated around the world every day. This hypothesis is not addressed by either Walker's analysis or Olivier and Walter's re-analysis-- ironically with respect to Olivier and Walter's erroneous criticisms of its statistical power as being in need of reduction, because the study has too small a sample size for this purpose. In fact Walker obtained only two relevant data, from the two occasions (helmeted) when he was actually struck.

The discipline of statistics is impotent to analyze such rare events, but the disciplines of engineering and medicine are not. In particular, because Walker was uninjured in one instance and only slightly injured in the other, it would seem that the impacts were only glancing. Whether the case in these instances or not, there are occasions where glancing blows do occur. This would mean that a favourable effect of basically any non-zero size would be sufficient to convert such collisions into otherwise clear, albeit alarming, passes.

In other words, the relevant scale against which to evaluate the effect size is not that of some conventional or mean passing distance, but of the distances that convert a clear if alarming pass, or an otherwise uneventful glancing blow or a brushing contact, into one with mechanical purchase sufficient to cause harm, or else the reverse.

Thus one family of trajectory comparisons that Walker's data invite us to consider, but that Olivier and Walter miss completely, can be exemplified as follows. Consider an overtaking motor vehicle about to make a pass at say 2 m, against a rider in the unhelmeted condition. Through some mishap-- e.g. the driver receiving a phone call or text message, or the rider swerving to avoid an obstacle and toppling-- the motor vehicle and bicycle move 1.99 m closer during overtaking. Because of the wider set-up,  the pass is still completed without further incident. Now consider the same overtaking manoeuvre, except with the rider in the helmeted condition, with the same mishap and same change in proximity, but this time with the overtaking vehicle's set-up reduced by the mean 5.8 cm adjusted or 8.5 cm unadjusted effect size found by Olivier and Walter. The result is a collision, and if done at speed, especially one requiring the 2 m clearance, likely a fatal one.

Likewise, consider a rider in the unhelmeted condition, with an initial set-up of 2 m clearance overtaking, and then some mishap causing a 2 m loss of separation, such that the handlebars get brushed without causing major injury. For downturned handlebars, the diameter of the tubing plus bar tape is approximately 2.5 cm; thus offsetting the contact point in from the surface by half this distance, or potentially less, would change the brushing into a disaster.

6. Relevance to risk compensation or lack thereof.
Olivier and Walter assert that "Walker's argument that helmet wearing affects the behaviour of motor vehicle drivers does not support risk compensation theory upon re-analysis." They further claim there is "little to no evidence" in favour of risk compensation theory, when actually there is unequivocal experimental demonstration of risk compensation when wearing safety equipment, including bicycle helmets [8-10].

In fact Walker proposed two possible mechanisms behind a helmet wearing effect, and discounted what he termed the "risk compensation" one. But Walker explained in a footnote that his use of that term was not standard. Over and above Walker's clarification, this is certainly not a standard risk compensation situation, because the risk is mostly to the cyclist but the compensation mostly or entirely by the motorist.

7. Relevance to vehicular cycling or lack thereof.
There is a result common to both analyses that needs clarification. Both found that over a range of 0.25 to 1.25 m, maintaining a narrower spacing from bicycle to curb typically resulted in a greater mean passing distance, the overtaking vehicles maintaining much the same line in each case. I note that the contrary rule of thumb refuted by Walker (presented in the original in a largely satirical context [11]), that cyclists should leave as much space to the curb as that by which they wish to be passed, is no standard practice, has nothing to do with vehicular or Cyclecraft or similarly themed safe cycling principles, and is likewise counter to their practice. The distances tested by Walker seem all well shy of lane centre for the roads described, and thus the results have no bearing on vehicular cycling-type practices such as taking the lane or control and release. These are used precisely to avoid inadequate passing distances, or passing awarenesses, that otherwise occur when safe bicycle to edge distances are coupled with insufficiently wide lanes.

8.
Finally, I note from the comments that Linda Ward, in support of Olivier and Walter and to chastize their critics, complains that non-peer reviewed work should not be cited even in this comment forum-- let alone in a journal article. Olivier and Walter complain likewise. As noted above, the actual article by Olivier and Walter cites a number of non-peer reviewed sources, and even relies upon them for its central understanding of what makes for safety. I expect Ward to complain correspondingly more vociferously, now that this has been brought to her attention; while Olivier and Walter should similarly chastize themselves. Nevertheless, in general it is a mistake to exclude non-peer reviewed work from transportation studies, if only because much of it (not in this case) is scientific or engineering work done by or for government agencies or others, and not published in journals.

Besides, in bicycle helmet studies, peer review as such is no great badge of honour. The quality of it is nothing to boast about.


References

1. Case LD, Ambrosius WT. Power and sample size. Methods Mol Biol 2007;404:377­408. doi: 10.1007/978-1-59745-530-5_19

2. Fox T (2010) The Amy Gillett Foundation 'A metre matters' campaign and other initiatives. Journal of the Australasian College of Road Safety 21: 22­23. http://acrs.org.au/wp-con.... Accessed Nov 2013.

3. Love DC, Breaud A, Burns S, Margulies J, Roman M, Lawrence R. Is the three-foot bicycle passing law working in Baltimore, Maryland? Accident Analysis and Prevention 2012;48:451-456. doi: 10.1016/j.aap.2012.03.002

4. Khan A, Bacchus A (1995) Bicycle use of highway shoulders. Transportation Research Record 1502:8-21.

5. National Conference of State Legislatures. Safely Passing Bicyclists. October 2013. http://www.ncsl.org/resea.... Accessed Nov 2013.

6. Safe Cycling Australia. The Campaign. http://www.safecyclingaus.... Accessed Nov 2013.

7. Ewald B. Post hoc choice of cut points introduced bias to diagnostic research. J Clin Epid 2006;59:798-801. doi: 10.1016/j.jclinepi.2005.11.025

8. Morrongiello BA, Walpole B, Lasenby J. Understanding children's injury-risk behavior: Wearing safety gear can lead to increased risk taking. Accident Analysis and Prevention 2007;39:618-623. doi: 10.1016/j.aap.2006.10.006

9. Lasenby-Lessard J, Morrongiello BA. Understanding risk compensation in children: experience with the activity and level of sensation seeking play a role. Accident Analysis and Prevention 2011;43:1341-1347. doi: 10.1016/j.aap.2011.02.006

10. Phillips RO, Fyhri A, Sagberg F. Risk compensation and bicycle helmets. Risk Analysis 2011;31:1187­1195. doi: 10.1111/j.1539-6924.2011.01589.x

11. Martin D. The theory of BIG. http://www.kapiticyclingc... =63&Itemid=1. Accessed Nov 2013.

No competing interests declared.

RE: Fundamental misconceptions of safety and of statistics

jakeolivier replied to MKary on 20 Aug 2014 at 05:04 GMT

M Kary,

I have sat on this response for quite some time as much of it seems like an attack with most, if not all, points lacking in merit. However, I have noticed you have cited your response to justify the claim “a recent and elaborate statistical reanalysis is constructed around the false claim that increasing the sample size increases the risk of type I errors”. I find this odd considering one of your criticisms is we cite non-peer reviewed work.

http://injuryprevention.b...

Firstly, you assert our study “confirms that Walker did his own analysis correctly”. This is not true. In our paper, we argue that piece-meal analysis, as Walker did, does not adequately address confounding. In fact, we found a significant difference in overtaking distance between observations taken in Salisbury and Bristol (the adjusted effect for CITY was larger than that for HELMET, 6.4cm vs. 5.8cm). When adjusted for other variables, the HELMET effect diminishes by over 30% (from 8.5cm to 5.8cm) which is an indication of confounding. Walker didn’t find that.

Taking your other comments in order

1. You don’t seem to have a grasp of the fundamentals of hypothesis testing. There are two main errors one can make in a hypothesis test, a type I and type II error. A type I error occurs when the null hypothesis is rejected when it is true, and a type II error occurs when the null hypothesis is not rejected when it is false. The truth of the null hypothesis is not known in reality. In statistics, we usually let alpha and beta, respectively, be the probability they occur. These two probabilities are inherently linked – as one increases, the other decreases. It is possible to minimise one but not the other when testing hypothesis (we can try to account for type II errors by choosing an appropriate sample size). Due to the link between hypothesis testing and the presumption of innocence in a trial, alpha is usually minimised.

The p-value is neither alpha nor beta. It is sometimes interpreted as the maximum alpha such that the hypothesis test is significant. However, alpha must be chosen before any analysis is performed and usually fixed at 5%. This is a fundamental concept taught in any introductory statistics course.

You state “In fact, large sample sizes never contribute to spurious results.” Here are a few references that state otherwise.

http://pareonline.net/get...
http://blog.minitab.com/b...

A simple example is to consider the t-test. The test statistic is t=m*sqrt(n)/s where n is the sample size, m is the sample mean and s is the sample standard deviation. The p-value is computed as 2*P(T>t). For fixed values of m and s, when n increases, the test statistic t increases and the p-value decreases. A consequence of this result is that any observed difference can be statistically significant at any alpha level. It is therefore important to compute sample size with this in mind, not overly small and not overly large.

Walker’s most recent paper on this topic supports the notion the statistically significant helmet wearing effect was spurious. Cyclists wore seven different types of outfits with six wearing helmets and one not. The “casual” cyclist had a mean overtaking distance of 117.61 while the range of the means for all types was 114.05 to 122.12. The best result was to dress like a police officer (with a helmet on).

http://www.sciencedirect....

You indicate we do not understand the difference between “statistical and other types of significance”. Walker did use Cohen’s recommendations for effect size in his sample size calculations (citation 26 in our paper). This corresponds to f=0.1 for an analysis of variance. However, he relied on statistical significance to identify “important” results. The results from our adjusted model can be used to compute an effect size for helmet use of d=0.16 (Cohen’s d is the usual effect size for comparing two means). Cohen gives recommendations of 0.2, 0.5 and 0.8 for small, medium and large effect sizes respectively. He considered any effect size less than small to be “trivial”. Note that when you convert Walker’s F statistic to Cohen’s d you get the smaller d=0.12. In other words, Walker’s helmet wearing effect is unimportant based on the same criteria he based his sample size calculation.

I do believe that a minimally important difference is vitally important, but neither Walker nor you (or anyone else) has provided any evidence of one. But, if you are going to rely on Cohen’s definitions of small effect sizes, then the helmet effect in Walker’s data is smaller than that.

I have never come across a study that computed sample size powered at 98%. You say our statement about sample size calculation “misleads again because it spuriously raises doubts if Walker was upfront…” We followed Walker’s explanation for computing sample size and got n=2251 (not the reported n=2259 in Walker’s paper, and certainly not the n=2355 observations in his data set). I do get 98% if I work backwards from a sample size of n=2259 though. Note the G*Power settings I used were “F tests”, “ANOVA: Fixed effects, special, main effects and interactions”, f=0.1, alpha=0.05, power=0.98, Numerator df=4 and Number of groups=10. He may have used other settings, but this is the closest I could get.

Walker also removed 35 observations because SPSS said they were extreme outliers. I also couldn’t reproduce this result.

2. As reported by Walker,

“The author was struck twice by overtaking vehicles, once by a bus and once by a heavy goods vehicle—the latter inflicting minor injury. On both occasions a helmet was being worn.”

You state that, if included, the “various calculations are correspondingly affected.” This information can be added to the data set and reanalysed. Note the only information we have here is vehicle type, passing distance (set to 0) and helmet use (set to 1), so we cannot construct a similar model to what was published. With passing distance as the dependent variable, the estimated helmet effects are -8.2cm and -8.5 with and without those two observations. When passing distance is categorised by the one metre rule, the odds ratios are 1.24 and 1.30 with and without the additional observations. In each case, the estimated effect moved closer to a null effect. This may seem counterintuitive, but it’s because vehicle type is confounded with helmet wearing for those two observations and vehicle type is a better predictor of overtaking distance.

You give three hypothetical situations in which injury can occur. Yet, Walker’s primary outcome, and ours, was passing distance. It was not injury (or a clinical outcome) as you put it elsewhere. The intention of separation of motorised traffic from cyclists is to eliminate or mitigate injury.

You clearly don’t like the idea of the one metre rule here, yet you don’t seem bothered that Walker also dichotomised his passing distance data using data driven values – this is problematic, in part, because it will differ from sample to sample which makes it difficult to generalise.

You should know that Walker was quite comfortable with the one metre dichotomisation. On his website, he stated

“For a recent US TV interview, I reanalyzed the data from this experiment to look at the numbers of vehicles coming within 1m (a measure used by the then TRRL back in 1979 in what was, I believe, the very first study of cycle overtaking). Doing this, I found there were 23% more vehicles coming within 1m of the bicycle when a helmet was being worn. I think this is perhaps the clearest way to illustrate the effect of helmet wearing seen in the data.”

This comment was removed from his website sometime after we pointed out the results were insignificant if analysed this way; however, you can find an archived copy here.

http://web.archive.org/we...

3. You claim we use inappropriate citations. The one metre and similar rules are road safety policy points. Therefore, much of the discussion is not occurring in the peer-reviewed literature. We included the Queensland reference because it demonstrates there is serious discussion around this topic, either for or against. Note that since our paper was published Queensland has adopted minimum passing rules for cyclists – 1 metre for speeds less than 60kph and 1.5 metres above 60kph. Note that Walker does not give posted limits in his paper, so it is unknown whether passing manoeuvres were in zones less than or greater than 60kph.

http://www.qld.gov.au/tra...

There is unfortunately scant peer-reviewed research, one way or the other, on this topic. The two citations listed were the only ones I could find. I contacted Khan and Bacchus for a copy of their paper and more information about trying to establish a “safe” passing distance. They provided the paper, but would not answer my queries.

Again, you clearly don’t like the one metre rule, but this is part of the road safety policy discussion around cycling and, therefore, is a very legitimate method for dichotomising passing distance. I find it quite strange you call our citations “irrelevant”, even though those references come from those who actually shape road safety policy, yet dismiss peer-review as “no great badge of honor.”

You further criticise us for neglecting other, more conservative, passing rules. You seem to have missed a crucial outcome of our paper. The differences in passing distance are quite similar whether wearing a helmet or not until after 2m. It’s the differences in overtaking manoeuvres greater than 2m that is driving the statistical significance. Can you provide evidence, not hypotheticals, that a difference in 7.2cm when the vehicle is already beyond 2m is a problem?

Interval | Difference | 95% CI
0-0.75 | -0.052 | -0.224, 0.121
0.75-1 | 0.003 | -0.061, 0.067
1-1.5 | 0.007 | -0.012, 0.027
1.5-2 | 0.017 | -0.003, 0.037
2-inf | 0.072 | 0.034, 0.109

4. I’ve addressed much of this above. Note that increasing sample size cannot “correct” for bias. Collecting more and more biased data does not result in a representative sample.

I’m confused by your quotes “good quality prior studies”. We never stated that anywhere and I find it troubling that you attribute that comment to us.

5. I’ve addressed much of this above. The rest is just hypotheticals. Can you provide evidence any of this is true? None of that actually happened in Walker’s study.

6. You state “there is unequivocal experimental demonstration of risk compensation when wearing safety equipment, including bicycle helmets” followed by three citations. The first two don’t seem to be relevant here as the participants were not cycling. The third makes the logical fallacy of affirming the consequent. They found usual helmet wearers cycled more slowly when not wearing a helmet. Risk compensation is directional – it’s about behaviour change when putting safety equipment on and not taking it off. We discuss the problems with this and other studies here.

http://acrs.org.au/files/...

In 2001, the Thompsons and Rivara called for a systematic review of the evidence around risk compensation. Thirteen years on, no such review exists. Why?

7. This comment doesn’t seem to have anything to do with our study. Perhaps you should address this issue with Walker.

8. I find it curious and disappointing that you’ve used the PLOS ONE reader comments to attack myself and the peer-reviewed system. Certainly peer-review is flawed, but how does the individual sift through the rubbish that anyone, including you, can post on the internet? Is posting commentary on anti-helmet websites really something to boast about?

http://cyclehelmets.org/1...

No competing interests declared.

RE: RE: Fundamental misconceptions of safety and of statistics

jakeolivier replied to jakeolivier on 09 Dec 2014 at 00:33 GMT

M Kary,

I’d like to make a few more comments about sample size and effect size as it relates to Walker’s results.

As I mentioned previously, Cohen’s d can be used as an effect size measure for the effect of helmet wearing. Using the reported F statistic in Walker’s paper for the effect of helmet wearing, Cohen’s d is 0.12. The distribution of Cohen’s d is asymptotically normal with variance 1/n1 + 1/n2. These results tell us if the helmet effect is a “small” effect (i.e., d=0.2) the probability of observing d=0.12 or smaller is 0.026.

With regards to how Walker computed sample size, he states

“a priori analysis with G*Power (Buchner et al., 1997) had identified 2259 as the number necessary reliably to identify a ‘small’ effect size of f = 0.1 in this design with α = .05 and β = .02.”

As noted previously, I computed n=2251 for a 2x5 ANOVA design. What I didn’t mention is that this is for the interaction between helmet wearing and distance to the kerb. This was not for the effect of helmet wearing alone and instead is testing whether the association between helmet wearing and passing distance differs for levels of kerb distance.

A similar sample size calculation for helmet wearing only comes to n=1614 overtaking events (assumes d=0.2 for a small effect size leaving all other parameters as before). These results are identical as an ANOVA with only helmet wearing as a predictor. We can turn this computation around a bit and use n=2355 and compute achieved power. This comes to power of 0.998. This result implies virtually any non-zero estimate of the difference in passing distance for helmeted versus unhelmeted cyclists would be statistically significant.

No competing interests declared.