Thursday, March 10, 2011

Length and Difficulty

I'm in a graphical statistical analytical mood today, so it's time to turn my attention to golf. A few years back I compiled a list of some 829 golf holes used on the PGA Tour or in major championships. Specifically I gathered data on the lengths and stroke averages of those 829 holes in the most recent tournament that had been played on them. Just now I noticed that, though it's early going yet, the par-5 1st hole at Doral, 530 yards, is playing to a 4.2 average while the par-4 3rd, 440 yards, is playing around 4.33. So naturally I got the idea into my head to check the statistical relationship between length and difficulty, using my handy data set.


It's a pretty strong relationship. Each extra yard of length adds 0.0042 average strokes, and this relationship accounts for about 92% of the variance in difficulty. That is one hell of a strong correlation. Obviously, this relationship also predicts that the stroke average for a hole of length 1" would be approximately 2.3, which is wrong. The stroke average for such a hole would be identically 1. I suspect that 2.3 is a decent approximation of where the stroke average of a non-putt hole would go as you got as close to 0 as is reasonable. I also think I see a bit of a breakdown in very short par-3's, which I think are harder than they "should" be. That's basically a point of pride for them, so it makes sense.

Anyway, the next logical step: make the same graph for each par separately. Here's par-3s:

The correlation is still there, but it's a hell of a lot weaker. Among par-3's, length is really not the most important factor in determining difficulty, accounting for just 21% of variation. This makes a whole lot of sense to me. It's not necessarily so much that the difference between a pitching wedge and a 3-iron is so little, it's more that when they're letting you hit a pitching-wedge into the green they'll usually make the green damn nasty. The outlier in terms of length is Oakmont's 8th, which is just mean at 288. There's a par-4 only eight yards longer.

Speaking of which, here are the par-4's:

Now length is accounting for 40% of the variance in difficulty, about twice as much as for par-3s. That seems kind of fitting, actually, since there are twice as many shots to hit.

Now here's par-5s:

My theory from par-4s breaks down here, though. Length is less important for par-5s, accounting for just 31% of differences in difficulty.

This is interesting, in fact: at first it looked like distance was almost the only factor in determining the difficulty of a hole. Now we see that, once you know the par of a hole, distance accounts for less than half of the remaining variety. So really what we saw initially was just the very strong correlation between par and difficulty, which is good, I think, in that it suggests par is a real thing.

As a related matter, though, I've always been interested in the idea of making a golf course where when you were designing it you didn't pay much attention to the notion of par. Just have eighteen holes spaced roughly evenly between 120-ish and 600-ish, and give some of them tough features (water, lots of bunkers, a tight green) and others easy features (wide fairway, open green, etc.), and see what happened. After all, "par" is just an artificial concept. One of my favorite anecdotes on this subject is that when the women had their British Open at St. Andrews the year after the men played it, in 2006 that is, they played the 17th from the same tees as the men (who later moved the tee back 40 yards in 2010), and called it a par-5. And it played to an easier stroke average for the women then it had for the men. Could be random, or it could be that the "par-4" label makes the men play it in a sub-optimal, overly aggressive way. Anyway, interesting.

No comments:

Post a Comment