Sunday, February 23, 2014

Baseball Is The Best Sport: Statistically Proved

(Disclaimer: only four sports are included, so at most I'm claiming that baseball is the best of the four. A game like golf, where you're not really counting up from zero and trying to get as high as possible, is really impossible to analyze using the technique of this post.)

A few weeks ago, a friend of mine noticed that a basketball team was losing by about sixty points. Well, okay, he appears to have misplaced the tens column by one; he said they were losing by 66 points, but it was actually 56 points. It was the Philadelphia 76ers, losing very badly to the Los Angeles Clippers, on February 9th. They ended up losing 123-78, a margin of a meager 45 points. He was shocked, though, that a professional basketball team could be losing by that kind of margin. (Note that this particular friend is really into sports, all sports actually, and is very knowledgeable about every sport, so his shock and disbelief is pretty meaningful.)

Responding to those comments, I began a statistical analysis of the relationship between average score and variance of score in the four major American team sports: baseball, football, basketball, and hockey. I completed that analysis a few minutes ago. (In between I was doing other things.) There are a number of interesting findings, but I shan't bury the lede: my initial intuition was very much correct that basketball scores vary a lot less in comparison to their average value than those of the other sports. Especially baseball. Baseball has the highest ratio of variance to average. Thus the title. You can see if you think I'm right.



So, the numbers. These are taken from the most recently completed regular season of each sport; as basketball and hockey are in season presently, that means they're using the 2012-13 seasons' data.

In the 2013 MLB season, each team scored an average of 4.166 runs per game. Team scores had a standard deviation of 2.952 runs. This standard deviation was 70.86% of the average score.

In the 2013-14 NFL season, each team scored an average of 23.408 points per game. Team scores had a standard deviation of 10.217 points. This standard deviation was 43.65% of the average score.

In the 2012-13 NBA season, each team scored an average of 98.138 points per game. Team scores had a standard deviation of 11.631 points. This standard deviation was 11.85% of the average score.

In the 2012-13 NHL season, each team scored an average of 2.767 goals per game. Team scores had a standard deviation of 1.634 goals. This standard deviation was 59.04% of the average score.

So, right off the bat, impression confirmed. Basketball scores don't vary a lot compared to the average score. This is in a sense obvious, as it is the only of the four sports where a shutout is properly unheard of. If your range doesn't go all the way to zero, there's a whole block of your average that just doesn't factor into the variance. Interestingly, shutouts are rare enough in the NFL, but they have happened on rare occasions, and sure enough football has the second-lowest variance-to-average ratio.* Of course, shutouts happen all the time in hockey, it's practically a miracle whenever a goal's scored, whereas in baseball they happen a fair amount but are a notable occurrence, so the Shutouts Hypothesis doesn't explain all of what we're seeing here.

A second level of analysis concerned the margins of each game. What my friend noticed, after all, was a blowout margin, not something particularly remarkable about one team's score. (Neither score was in fact especially remarkable in the end; Philadelphia put up a z-score of -1.73 while LA's z-score was 2.14.) So here are similar numbers for the margin instead of team scores:

In the 2013 MLB season, games were decided by an average of 3.285 runs. Margins of victory featured a 2.538 run standard deviation. This standard deviation was 76.93% of the average margin.

In the 2013-14 NFL season, games were decided by an average of 11.293 points. Margins of victory featured a 9.067 point standard deviation. This standard deviation was 80.29% of the average margin.

In the 2012-13 NBA season, games were decided by an average of 10.990 points. Margins of victory featured a 7.938 point standard deviation. This standard deviation was 72.21% of the average margin.
In the 2012-13 NHL season, games were decided by an average of 2.049 goals. Margins of victory featured a 1.274 goal standard deviation. This standard deviation was 62.17% of the average margin.

A few observations. First, the ratios here are a lot more clustered than they were before. That makes sense: because we're comparing two numbers drawn from the same distribution to each other, we've removed the element that's about the positioning of that distribution on the number line, i.e. not about the variance. And it turns out that the sports behave a lot alike each other about how margins of victory are distributed. Interestingly, measuring the 49ers' loss using these margin numbers doesn't yield very different results from the not-as-valid measurement of each team's score separately. The margin had a z-score of 4.28, not that different from the 3.86 z-score gap between the two teams' scores seen above. The two methods differed a bit more when the lead got up to 56 points briefly: then, the two teams' scores 4.81 times the standard deviation in team score, but a 56-point margin would have a z-score of 5.67. (Of course, perhaps one should inflate both numbers a bit, as the game was still ongoing at that point.)

Second, comparing the four sports here isn't necessarily completely valid, as I'm not sure they all have the same policy with regard to ties. Actually looking through the NHL data I got the sense that they may have instituted some newfangled system for eliminating ties. I think they happen sometimes in football, and that they go years and years without happening in baseball. Anyway, just a factor to consider. (Actually, in the relevant seasons there were two ties in the NFL and none in any of the other leagues. So I guess this isn't much of a concern. But it used to be when ties were common in hockey.)

But it turns out that the really interesting thing, for evaluating the sports at least, comes from comparing these margin numbers to the score numbers above. First a couple of minor points of interest, followed by what I think is the best single number to come out of my analysis.

One thing we can do is compare the standard deviation of the margin to the standard deviation of each side's score. Why would we do this? Well, in principle it should tell us something about the way the two scores in each game are interacting with one another. I'm not sure what it tells us, and I also think one seasons' worth of data isn't remotely enough to draw any firm conclusions given the narrow range we see (the same concern applies to the variance-to-mean ratios for margins seen above), but it's information I calculated, so I'm gonna include it. This number was 85.62% for MLB, 88.74% for the NFL, 68.23% for the NBA, and 77.98% for the NHL.

We can also compare the average margin to the standard deviation of the individual scores. This also could be telling us something about how the two scores are relating to one another, but again, I don't know what it is. Anyway, this number was 111.30% for MLB, 110.53% for the NFL, 94.49% for the NBA, and 125.42% for the NHL. The main lesson I draw from these two numbers is that again we're seeing the NBA as the outlier. Not only does the NBA feature far, far less variance in its scores relative to their average level than the other sports, but less of that variance translates into actual variation in margins of victory.

Combining everything I've said so far, we get the most meaningful, and also perhaps the simplest, metric for analyzing all of this data: the ratio between the average margin of victory and the average team score. You can see that this is the thing I was originally getting at. The results here are pretty similar to the original variance-to-mean ratio for scores, though basketball and hockey each move a bit relative to the other two. For baseball, this ratio was 78.86%. For football it was 48.24%. For basketball, 11.20%. And for hockey, 74.05%. So the gap between baseball and hockey has gone down a lot. And it's not very visible, because the gap is so huge to begin with, but basketball is now lagging even farther behind the other three.

So if you've ever watched a basketball game and just had the sense that it was pretty monotonous, that it's basically just the two teams running the ball across the court and scoring in more-or-less alternation and then at the end of the day one of them did slightly worse and the other team wins, well, this data bears that out. Eighty-nine percent of the scoring in basketball games is just to get the score up to the area where basketball scores live, clustered around 100. The other eleven percent goes into determining who wins. That's perhaps a slightly too fanciful interpretation, but it's not wholly without merit, I think.

Baseball is, on the other hand, the best sport, measured this way. It's the game where the most of the action is actually meaningful in determining the result. That makes intuitive sense to me: it's the only one where both a truly low-scoring game, i.e. with both scores near zero, and a truly high-scoring game, i.e. with both scores relatively far from zero, happen relatively frequently. You see 1-0 games and 10-9 games, in other words. Hockey teams posted 2050 scores in the relevant season, of which only 305 were a 5 or higher. That's a little under 15%. Of that 305, only 25 were against an opponent who also scored at least a 5. (That, uh, can't actually be right: that number should be even, as each game should show up twice. I don't know why that's going wrong, but I don't care enough to investigate; it makes little difference.) That's barely over 1% of the games as true "slugfests," where both teams put up a high score. A little over 19% of baseball scores, on the other hand, were a 7 or higher, and nearly 4% of all games saw both teams score at least 7 runs. Football and basketball, meanwhile, barely use the part of the distribution that's actually near zero; only 46 of 512 football scores were in the single digits, while the lowest NBA score was 58.

(Actually, here's an interesting metric: percent of all scores within one score-standard-deviation of zero. For baseball, this was 33%, although since the standard deviation was 2.93 it's a little unfair not to count the 3's, which would boost it to almost 48%. For football this was about 12%. For hockey it was 46%, though if, following baseball, we include the 3s, that jumps to an incredible 69%. For basketball it is, of course, zero.)

So, those are the results of my investigation of the statistical behavior of scores in the four major professional team sports leagues. The conclusion is that basketball sucks and baseball is awesome, I think. (Although I may be reading a bit much into the data there...) Certainly the four behave quite differently from one another, not surprising since one can almost always tell from a final score which sport the game was, the overlap between pitchers' duels in baseball and any hockey game being the only exception.

*Yes, I know, the standard deviation isn't the variance. It's a lot easier to just say "variance" sometimes, though, and it's a measure of variance, so shut up, okay?

1 comment:

  1. We can ask about the average and standard deviation for how many times the defense is able to stop its opponent from scoring during a basketball game. The ratio of standard deviation to average might be quite high for this measure. Measured in this way, I bet basketball joins the other three in terms of being "best"!

    ReplyDelete