Why not subscribe?

Saturday, April 02, 2011

Two approaches to statistics

The site  “CrossValidated” is a Q&A site for statistical questions; it’s part of StackExchange. It’s an interesting place to see what sort of questions people are asking, and get answers that are often shorter and clearer than the professional literature.
Sometimes they illustrate one of the great divided in statistics – between scientists/engineers/programmers on the one side, who are interested in using statistics as a tool and moving on, and mathematical types, who appreciate the beauty of a good estimation algorithm and the occasional derivation.
A question I saw yesterday illustrates this nicely:
There’s an additional note that “This question came from our [StackOverflow] site for professional and enthusiast programmers.”
First, a couple of scientist/engineer/programmer answers:
These are similar answers. If you need to have a number in there that’s a correlation (and a mean absolute difference is a different type of measure), just put in zero.  Simple programming logic:  If (std_dev(x)=0 or std_dev(y)=0) then let r=0
Second, a mathematical answer.
It’s not so much that this answer is longer, and clearly a challenge to code, there’s this kicker in the first sentence below:
The intent here is not to poke fun at the last responder (well, maybe a little).  “Probabilityislogic” has a lot of points on this site, which means not only that they have contributed a lot, they’ve also had their answers rated highly by other people. The point here is that these answers come from different sides of the great cultural divide in statistics.