Why not subscribe?

Wednesday, November 28, 2007

Bad Graph Mars Suicide Barrier Argument

UPDATE: In the comments, Glasgow provides a link to an improved graph that does not have the problems noted here, but supports the point of his article.

Garrett Glasgow wants to fight the installation of a suicide barrier on the Cold Spring Harbor Bridge in the worst way, and in this graph he seems to find the worst way.

On the X axis, we have the number of high bridges in each state. On the Y axis we have the suicide RATE for each of several years. So, there are probably about 800 points on this graph. These will be collinear, but that's a minor error compared to others shown here.

First, he's comparing a NUMBER on the X axis with a RATE on the Y axis. That's not really Kosher in most cases. The rate is relatively dimensionless -- the average rate won't vary with the size of the state. The number of bridges does vary though. In order to see this more clearly, suppose we divided California into two states, NoCal and SoCal. The number of bridges would be roughly half the previous number, but the suicide rate wouldn't change. So, if we had 10 bridges and a suicide rate of 15, we might see 5 bridges and a suicide rate of 14 in one half, and 5 bridges with a suicide rate of 16 in the other half.

Second, he's done an unconvincing regression analysis, which "reveals there is a negative relationship between the overall suicide rate and the number of bridges in a state, exactly the opposite of the relationship we would expect to see if bridges helped cause suicides."

That's not what I see here. I see a lot of variation in the suicide rate. I see a bunch of states with few big bridges, and I see all the data from 6 bridges on out to 35 essentially showing no trend at all. Whatever state has about 12 bridges might be a bit lower than average on the suicide rate, but there could be many causes for that. The data are very skewed, but no attempt seems to have been made to deal with that.

So, what I see here is (a) a bogus comparison being made on the graph, and (b) a regression analysis doomed to failure because of the incompatibility of the X-axis variable with the Y-axis variable -- nevertheless overinterpreted as showing an important contrary result.

I'm not sure how I would change the number of high bridges into a rate number than could then be compared to the suicide rate. Should this be the number of high bridges per 100,000 population in a state? That's probably the most reasonable thing, but some of these bridges might be in remote areas with little local population.