Why not subscribe?

Saturday, January 05, 2013

More from the Statistical Priests

Sean J. Taylor goes on a rant about statistical software:

Stata and SPSS [etc.] I have found them to be correlated with bad quality science….

When you don’t have to code your own estimators, you probably won’t understand what you’re doing.

When operating software doesn’t require a lot of training, users of that software are likely to be poorly trained. …  trust results produced using R, not because it is better software, but because it is difficult to learn…

When you use proprietary software, you are sending the message that you don’t care about whether people can replicate your analyses or verify that the code was correct. Most commercial software is closed source and expensive.  We can never know if the statisticians at Stata have a bug in their code unless we trust them to tell us. 

There’s more, but you get the idea.

What are we to make of this rant?

This is the statistical priesthood screening out the unwashed.

When I first taught statistics, we did it by hand (sometimes with the aid of hand-crank calculators: to multiply 4 times 3, enter the 4 and turn the crank 3 times). I learned factor analysis using Benjamin Fruchter’s book, which has you do rotations using graph paper. There’s some learning value to this, but not much and not even the most Luddite of statisticians does it this way.

There’s no bloody reason to code your own estimators if you meet the assumptions of a standard analysis. THAT’s where the education is — learning what assumptions you can make, can’t make, and can make if you test them — not writing code if prepackaged, tested commercial software will do the trick.

That last point about replication?

The last point is particularly curious since given data and a modeling statement it’s equally as likely you could replicate a SAS PROC MIXED run as an R LMER run. There’s also literature showing how to replicate analyses across various software (e.g. Brady West, et al. Linear Mixed Models: A Practical Guide Using Statistical Software which compares SAS, SPSS, Stata, R, and HLM).  In addition, it’s clear that the key problem in replicating an analysis is getting the researchers to share the data, something that has nothing to do with software. Finally, if there is an error in an analysis, it is more likely to be caught by those replicating the analysis if they are using different software, not the same software. Subtle errors in the data prep – I’d wager these are the most common unintentional errors – are more likely to be caught in a recoding than in a rerunning.