This posting is just a place to make my critique publicly available to those locally who've requested it. It probably won't be of general interest.

Mike,

If
you have a chance could you take a look at these two reports and let me know if
the statistical methodology looks right. I don’t need a full review or anything
like that. I just don’t have any way of knowing if they really know what they
are talking about. I believe that it is all the same data that both reports are
working from.

When
I inquired of ActiveTrans [Chicago-based Active Transportation Alliance, a bicycling / mass transit / pedestrian advocacy group] if there were peer reviewed studies supporting the
safety of protected bike lanes, they sent me a link to the People for Bikes web
page:

http://www.peopleforbikes.org/statistics

When
I looked at what they had on the safety of protected bike lanes I found lots of
stuff about people “feeling” safer, and puff pieces and memos by advocates, but
this single Canadian study seemed to be the only thing that approached rigor, I
think.

####

Full citations for this study:

Harris, M. A., Reynolds, C. C. O., Winters, M., Cripton, P. A., Shen, H., Chipman, M. L., … Teschke, K. (2013). Comparing the effects of infrastructure on bicycling injury at intersections and non-intersections using a case–crossover design. Injury Prevention, 19(5), 303–310. http://doi.org/10.1136/injuryprev-2012-040561

Teschke, K., Harris, M. A., Reynolds, C. C. O., Winters, M., Babul, S., Chipman, M., … Cripton, P. A. (2012). Route Infrastructure and the Risk of Injuries to Bicyclists: A Case-Crossover Study. American Journal of Public Health, 102(12), 2336–2343. http://doi.org/10.2105/AJPH.2012.300762

Here are some comments. Note that while I am a statistician, I
work in marketing research, not transportation analysis.

This is really one study, with different parts of the analysis
published in AJPH (2012) or BMJ (2013).

Case crossover is a very reasonable study design, although the
relative risk factors in this case are less stable than they seem. A logistic
regression model (equation 1) seems reasonable.

This is an exploratory study; note in table 4 of AJPH they
report 14 significance tests, two ways (unadjusted, and adjusted) at the 5%
level. Five of the 14 confidence intervals show significance
(unadjusted). The results about the same for the adjusted (which is
good), and to simplify the discussion I’m just going to consider the
unadjusted.

The finding that jumps out at you is that .12 odds ratio (OR) for cycle tracks, and 88% reduction. That seems huge, but we need to look a bit more carefully at
this.

1. First
of all, it’s NOT an 88% reduction. It’s an 88% reduction in the OR relative to the reference
condition (major street, parked cars, no bike infrastructure). But that
particular condition is a relatively dangerous condition. It’s appropriate to
run the study that way (you usually pick the largest condition as the reference
condition), but it’s easy to say “88% reduction” while forgetting it’s NOT an
88% reduction overall, just to pretty much the most dangerous condition in
their data.

*For example,** I might be only 10% more polite than the average person, but
I’m 88% more polite than Donald Trump.*

2. The
confidence intervals for all of infrastructure options overlap. There’s no
statistical difference between any of these:

a.
Local street, no bike infrastructure

b.
Local street, designated bike route

c.
Local street, designated bike route with traffic calming

d.
Off street, sidewalk

e.
Off street, multiuse path paved

f.
Off street, multiuse path unpaved

g.
Bike path

h.
Cycle track (i.e. protected bike lane)

This is because the confidence
intervals overlap. The tests show that some of these are different than the
reference condition (major street, parked cars, no bike infrastructure), and
some are not. It is true that this overlapping intervals method I’m using
is only approximate, but the overlap is pretty large. Interpreting these
non-differences as differences is a common statistical reasoning error. See,
for example,

Gelman, A., and Stern, H.
(2006), “The Difference Between ‘Significant’ and‘Not Significant’ is Not Itself Statistically
Significant,” *The
American Statistician*,60,
328–331.

3. The
cycle track difference is pretty frail. From table 4, there are 2
accidents on cycle tracks, 10 non-accidents on cycle tracks (the control
observations). While they fit a logistic model using the overall data, we can
best see why this is a frail result by considering this as a binomial, like a
coin flip. Because of the way the case crossover design works, we could
expect the same number of accidents and non-accidents on cycle tracks, e.g. a
50-50 split.

a.
With 12 observations and a 50% expectation, we would expect 6
and 6, but just like a coin flip we would probably see a result that
varied. A 2-10 split is (as reported) statistically reliable, but just
barely so. 3-9 would not be (one more accident). 2-9 would not be (one fewer
control in a cycle track). So, if we change ONE OBSERVATION in either
direction, we have NO STATISTICALLY SIGNIFICANT EFFECT AT ALL for cycle
tracks.

b.
Since the control segment reflects a random choice by the
investigators, it’s just luck that they picked 10, rather than 9, control
segments on cycle tracks. (In fact, doing a rough calculation there’s
about a 45% chance of picking 9 or fewer controls and getting no significant
effect at all.)

c.
In short, there’s a good chance this 88% reduction has some type
M error in it (actual magnitude, if we were to do a bunch of similar studies,
would be far less than 88%). Again, I want to emphasize that this does NOT
mean the investigators did anything wrong in their reporting or analysis.

4. There
are a couple of other quirks in this study. The accident risk was NOT *higher* at
intersections (OR = .96, nonsignficantly *lower*), which I don’t think is a normal
finding. I’m used to thinking intersections are much, much more dangerous than
non-intersections; I’m pretty sure that was John Forester’s analysis. But
I no longer have a copy of Effective Cycling.

Note in the BMJ article they analyze intersections and
non-intersections separately.

The BMJ article uses the same data, but different controls – in
fact, for some cases they use multiple controls for the same case; it’s not
clear to me how they adjusted for this non-independence, and it’s clearly
harder for me to do the approximate binomial calculations. (not saying they did
anything odd, just that it’s not clear how they handled the cross-case
dependence). In the BMJ article, cycle track is statistically significant
for non-intersections, but not for intersections. But there aren’t any cycle
track accidents at intersections here, so we’re out of data.

**Overall: it’s a pretty good study, and seems to have paid
careful attention to definitions, etc. But it’s only one study, and the
cycle track result seems to depend on the choice of a single, random control
case. Not much of a platform to spend millions of infrastructure money
on, if that’s all we’ve got.**

And, given the small data size, they obviously couldn’t
distinguish the type of barrier used on the cycle track (curb, bollards, or parked
cars).

SLIGHT ADDENDUM: Instead of looking at the binomial as
2-10 versus 6-6 expectation, we could compare the observed proportion of
accidents (2 out of 690) with the observed population of controls (10 out of
690). In this case, 2-9 is still significant, 2-8 is not, so this depends on a
change in the random picking of 2 controls, not one. That’s about a 33% chance, not a 46%
chance. But the general notion that this is a fragile result is still
true – because it depends on a carefully constructed, but still small, data
set.

I think part of the impetus for asking me to look at this was the current controversy over a protected bike lane on Dodge in Evanston, IL

In this case, a protected bike lane (with parking on the left, curb on the right) has replaced a regular bike lane (with traffic on the left, parking on the right). I have not been on Dodge since the change, and can't comment on this particular case.