Tuesday, February 2, 2010

Hypothesis Testing Does Not Exist

I just got out of stat class, which was mostly spent recapping the last bit of first semester. Toward the end the professor, who has what I think is a thick Greek accent, talked about how if you phrase a hypothesis test with the null hypothesis as "X = a" and the alternative as "X = b", you get very different results from if you phrase it with the null as "X = b" and the alternative "X = a". Now, I'm not entirely certain I'm right about this, but I think that part of the reason this very confusing fact happens is that hypothesis testing does not exist. In reality, the thing called "hypothesis testing" is really just a somewhat muddled and overly discontinuous* rearrangement of confidence values.

That hypothesis testing and confidence values are the same I am certain is true. Confidence values, or p-values, which in theory work like this: you do some experiment and get an estimator for your parameter, call it y. Now, for any other possible value of the parameter, near or far from y, (call it x), there is a confidence value that is, essentially, "the chance that, if the true value of the parameter is x, you would get a result as far from x as y by pure chance." Now, as it happens, in many cases these p-values are mathematically symmetric, which makes the English descriptions a lot easier.

An hypothesis test works basically like this: you define a null hypothesis and an alternative hypothesis, and then you get your data, and if the data have a p-value of less than some predefined standard, often .05, with respect to the null hypothesis in the direction of the alternative hypothesis, you reject the null and accept the alternative. Otherwise, you accept the null. There's virtually no categorical difference between this and a confidence value; in fact, you're using confidence values.

Now, here's the thing, and this is why I think this asymmetry creeps in with hypothesis testing. It's also why I don't like hypothesis tests very much. You've established, essentially, a burden of proof. Beyond such-and-such a level of statistical confidence that the data contradict the null hypothesis, you reject it. Below that threshold, you do not reject it. So there is a meaning in the "null" hypothesis, because it is your default assumption, just like (in fact, exactly like) the way "innocent" is the default assumption a jury is supposed to have about a defendant.

What you're doing here, essentially, is collapsing a continuous, linear universe of confidence values onto a discrete, binary universe. In doing so, you lose a lot of information, and of course you also create this "burden of proof." For decision-making purposes, that's fine, because at some point you do need to collapse the continuous world of analysis onto a discrete universe of actions, but for informational purposes it is by definition lacking. If a poll shows Obama leading Palin by 8 points for a 2012 matchup, but the margin of error is 4.1, strictly speaking the lead is not "statistically significant." So what? Does that mean this poll doesn't indicate Obama's ahead? No. It means the data suggest approximately 92% or 93% or 94% confidence that he is, but not quite 95%, which is some kind of magic number in statistics. That we should be 90% sure Obama's leading is valuable information; I see no reason to obscure this information with a threshold we don't yet need to have. In fact, in polling you never need to have it, since you can in theory continue to hedge your predictive bets about an election all the way until election day.


*A reference to Richard Dawkins' notion of the "tyranny of the discontinuous mind," the human impulse to categorize everything with bright-line boundaries.

No comments:

Post a Comment