Thursday, October 25, 2007

Abolish the 103rd Congress -- and reform statistical jargon!

Andrew Gelman blogs a good suggestion:

"Could the entire subfield of American politics please stop talking about the 77th Congress or the 103rd Congress and start talking about the 1941-42 Congress and the 1993-94 Congress and so forth? This would just make everybody's life easier."

But at least "103rd Congress" always refers to the same thing whether you are reading the NYT, WSJ, Time or Newsweek.

In statistics, the symbology changes more often than sheets at a cheap motel.

Consider, for example, this incomplete listing of choices for the parameters of an NBD (negative binomial) model:

Johnson, Kotz and Kemp: P, k
they also offer q, k
Greenwood and Yule: alpha, beta
Jeffreys: beta, rho
Anscombe: alpha, lambda
Evans: a, m
Evans, Hastings et al: x, p
Wikipedia: r, p
Wikipedia also offers omega, p
Ehrenberg: m, k
Ehrenberg also: m, a
Hardie: r, alpha
Guenther: p, k (NOT the same thing as P, k)
Systat: P, K (IS the same thing as p, k)

and, of course, I'm not done yet -- just tired of typing. For this particular function, I have a big cross-reference table to keep the literature straight.

It would be a mistake to require full standardization, but this rather meaningless arbitrariness doesn't serve anyone well.

Why does this persist? I would have naively thought that with the dominance of a relatively few software choices (SAS, SPSS, R/S) that terminology would be standardized, just as Microsoft inadvertently standardized the rules for Hearts.