Tuesday, March 2, 2010

Predictions 101

You hear people say they don’t care about the weather – they care about the climate. You hear people say they don’t care about the business cycle – they care about economic growth. Well, that’s how I feel about coaches. You find a good coach that can build an efficient team and wait for things to go your way. If you have a good coach, live with the random noise.

But for some reason people are fascinated by what’s happening today. They are fascinated by how good their team is now, and how good their team is going to be next year. And nothing in my coaching series really covered that.

I made some predictions in the Basketball Prospectus book, but today I want to introduce a more general numerical model that mimics how people make predictions from year-to-year. Tomorrow, I’ll share the 2010 predictions for the top five conferences and evaluate what teams have stumbled and what teams have exceeded expectations this season.

The variable I’m going to focus on is the yearly change in adjusted offensive and defensive efficiency from 2007 to 2008 and 2008 to 2009. What caused team’s efficiency ratings to change? Here’s one model:

Change in Offensive Efficiency = f (Improvement in Juniors, Improvement in Sophomores, Improvement in Freshman, Value of Departing Players, Value of Incoming Players)

Departing players include transfers, early entrants, and graduating players. For returning coaches the impact of the coach should mostly difference out. This is why there is no coaching term. For new coaches, I add one term to the equation:

Change in Offensive Efficiency = f ( “” , Adaptation of Players to New Coaching System)

Most predictive models use returning minutes. But thanks to Ken Pomeroy’s hard work tabulating player data, we can take the numbers one step further. In general, players that shoot more are more important than players that don’t shoot very often, at least on offense. So instead of using departing and returning minutes, I weight by percentage of possessions used. Thus my primary variables are percentage of departing and returning possessions for a given team.

I also break apart returning possessions by class. Based on a mounting volume of evidence including what I put in the aforementioned Basketball Prospectus book and what Big Ten Geeks did this spring, I’m convinced that returning freshman have a bigger impact than other returning players, thus I break down retuning minutes by class.

I don’t think there is anything systematic about players with high or low efficiency ratings improving more often. So I am not including interaction terms for the efficiency of the returning players.

But I do want to include an interaction term for the departing players. If a bunch of seniors with 124.1 ratings graduate, that’s a lot worse than if a bunch of seniors with 88.2 ratings graduate.

Next, as I documented in the Basketball Prospectus book, freshman have huge variance. So the incoming players are going to be mostly a random error component. But as Luke Winn articulated so beautifully for Sports Illustrated, if there’s any predictable impact, it comes from Top 10 recruits. My model includes a dummy for Top 10 recruits and Top 100 recruits.

The results are very similar whether I use all schools or just the BCS schools. So I use all schools for 677 data points total in these two years. Here is a description of the results:

As expected, teams with the most freshman possessions are the most likely to improve. This isn’t so much a, “freshman work hard in the off-season” effect as a “boy last year sucked because we had to give the freshman so many shots” effect. Returning juniors have a slightly bigger effect than sophomores, but the difference is not statistically significant.

For departing players, the higher the individual efficiency or ORtg of the departing player, the more the team’s offensive efficiency is expected to drop. In fact, losing highly inefficient players is not harmful at all. If a departing player’s efficiency rating is below 91.2, his departure benefits the team’s offense. But the departure of players above this level hurts the offense, and the departure of highly efficient players is very costly.

Top 10 and Top 100 Freshmen have a small impact. For every Kentucky this year, there is a North Carolina this year. Each Top 10 Freshman recruit increases team efficiency by about 1.13 points. Each Top 100 Freshman recruit increases offensive efficiency by about 0.26 points. I’ve been playing around with recruiting data off and on for over three years and I have never gotten a huge effect. I’m to the point now where I really believe the average effect is minimal. For every John Wall this year, there is a Lance Stephenson this year.

New coaches (first-time or school-changers) have a negative impact, on average. While there are obviously many turnaround stories, an equal number of coaches inherit disasters where simply treading water can be viewed as a success in year one.

On defense, I use returning minutes, not returning possessions, since possessions don’t directly impact defense. Returning freshman, sophomores, and juniors all have about the same effect on defense. Previous coaching experience has a noticeable impact on defense. I.e., the better the defense at the previous school, the bigger the impact on defense at the new school. The cutoff in this case is an adjusted defense of 98.8. If the coach can beat this, he’s going to have a positive impact on defense. The further below this level, the bigger the impact.

Two key factors that my model misses: First, I don’t account for transfers. Second, I don’t account for height on defense, which is clearly critical. These adjustments may be made someday in the future.

OK, so what happens when I take this data, estimated based on 2008 and 2009, and predict the 2010 season. What changes would it have predicted? These are shown in the next table.

-When adjusted defense goes down, that’s a good thing.
-Adjusted defense has been falling on average, so the baseline here is slightly negative.
-I’m cheating a bit by knowing Renardo Sidney and Rodney White and others don’t play this year. So maybe this is a late November model instead of an October predictive model.

What does the model predict? It predicts a worse season for Pittsburgh and Marquette. And that’s exactly what the humans predicted. The model also loves Kansas and so did many pre-season publications.

The model also expected a big drop-off for North Carolina. The Tar Heels lost a lot of key players, and unlike foolish people like me who loved the Tar Heels recruits this year, the model appreciates that even elite recruits are no guarantee.

But these are just the predicted changes. What would the actual predicted standings have looked like with this model? Tune in for the next post.