Tuesday, March 22, 2011

Going 1-for-4

So, as of writing Jose Reyes is 1-4 on today's spring training game. I was just thinking about whether going 1-4 tells you anything, and my intuitive conclusion is that it doesn't. After all, while it's true that you hit to a .250 average that day, which isn't great but isn't atrocious, it's also true that if you get four at-bats you're most likely to get 1 hit if your "true" batting average is anywhere from .125 to .375. So I think it tells you very little. But I decided to check and see if there's a statistical basis for that: how does the chance of going 1-4 change as your batting average goes from, say, .200 to .300? The answer: very little.
There are, obviously, five options for a day in which you have four at-bats: zero, one, two, three, or four hits. Note that the main trend as "true" batting average rises from .200 to .350, basically from the worst in the league to the best in the league, is that the odds of going 0-4 fall drastically, from about 41% at the Mendoza line to around 18% for an Ichiro type. The vast majority of that drop goes to your chances of going 2-4, which rise from 15% to 31%. The likelihood of 3-4 goes up rather considerably, too, from about 3% to more like 12%. Your odds of have a 4-4 day, though, rise only marginally, from "negligible" to, maybe, 2%. But look at what happens to your odds of going 1-4. Almost nothing. From .200 to .300, the odds of getting exactly one hit go from 41.0% to 41.2%. Big whoop. Admittedly it rises, imperceptibly, to 42.2% right around a .250 average itself, and falls a wee bit to 39% up around .350. But what we're seeing here is that a 1-4 day at the plate really does tell you almost nothing about how good a hitter you are (except that you're major-league caliber).

To get a little hyper-technical here, there are two primary modes of analyzing the truth of claims, hypothesis testing and Bayesianism. Under the hypothesis testing model, I can say a few things based on these stats. First of all, if you go 3-4 then I'm 95% confident you're not a .200 hitter. Hell, if you go 4-4 then I'm 95% confident you're not a .200 hitter, a .300 hitter, or a .350 hitter. (This is part of why I think hypothesis testing is silly.) Moreover, if you go 2-4 I can still be about 85% confident you're not a .200 hitter, and if you go 0-4 I'm 80% confident you're not a .350 hitter. But if you tell me that you went 1-4, then all I can be is, maybe, 60% confident you're not a .350 hitter. And, at the same time, 55% confident you're not a .250 hitter, either, even though that is the most likely thing for you to be. (This is the other part of why I think hypothesis testing is silly.) In other words, if you go 1-4 I really can't say anything with any confidence about what kind of hitter you are.

In the Bayesian model we see the same thing, only more logically laid out. The way I conceptualize the Bayesian method of incorporating new information is that you have a pre-existing assumption or model of the probability function of the various parameters and you also have a graph like the one above, of the probabilities of the various outcomes based on the values of the parameters. Then you basically multiply those two curves together, i.e. multiply them at each point in the parameter-space, and then normalize everything so it still sums to one. So, for instance, if I have a starting assumption that all true batting averages between .200 and .350 are equally likely (which would be an odd assumption, but whatever...), and then I observe that my player goes 2-4, I'm going to end up thinking that batting averages in the low .200's are actually about half as likely as averages in the mid-.300's. On the other hand, if I thought that poor averages were twice as likely as outstanding averages to start with, then after a 2-4 day I'm going to think that all averages are about as likely as one another (if I'm weighting the one night's play equally with my a priori assumptions).

But what if what I see is a 1-4 performance? Well, look, the total deviation in the probability of 1-4 over this range is from 39% to 42%. Multiplying my a priori probability function by the chance of 1-4 at each point is quite simply not going to change things very much. I'm maybe slightly less likely to think I'm dealing with Albert Pujols or with Jeff Mathis than with someone like Rey Ordonez (career average = .246). But it's really a trivial difference. So the truth is that going 1-4 just doesn't change my pre-existing idea of what kind of a hitter I think someone is to any noticeable degree. And I just demonstrated that using fancy statistical notions, not just my impulse that, hey, it's closest to the average result over the entire range of Major League-level hitting.

No comments:

Post a Comment