Future Tweaks to the Model
I developed my predictive model last spring and the model is still in the experimental stage. Similarly, Ken Pomeroy’s predictive model in the Basketball Prospectus book is humbly referred to as Projections, Version 1.0. There are still a lot of things we can and will improve in future years. Today I want to talk about some of the things I have in mind for the future.
Can we learn anything from this year’s rankings?
Looking back at all my conference predictions, there are a few teams that stand out as surprises. Teams like Michigan St. and Villanova are a little lower in my rankings than where most experts have them pegged. But that is largely because neither of these teams had dominating efficiency numbers last year.
But the ranking that bothers me the most is North Carolina’s low ranking. And I think Seth Davis’ recent review of North Carolina points out the current biggest problem with my model. My model does not adequately account for injuries. My model looks at Tyler Zeller and sees a player that could barely crack the rotation last year. But that is clearly wrong. Zeller was not being held out because he was not good enough. He was not playing because he was injured on multiple occasions.
Now, this is not such a simple adjustment as you might think. The model may under-rate Zeller’s return, but it is important to realize that not all players will successfully recover from injuries and return to a dominant level of play. Are players with ACL tears more likely to suffer future ACL tears? Are players with foot problems (see Zeller) more likely to continue to miss games in the future? Ideally we would have a database of injuries and project how likely players are to recover from each type of injury. But because that database does not exist, my future project will try to look at how well players recover from a “general” injury.
Second, we need to do more to account for transfers. Simply plugging in a transfer into the new lineup is probably not sufficient. And despite the occasional Wesley Johnson type player, we need to do more to understand how often players succeed in their new environments. For every Wesley Johnson, how many Alex Legion’s are there out there?
Third, there are some problems that will be solved when we have a larger sample size. Right now, I am estimating my model based on three years of returning tempo free player data. That is not a lot of data to draw conclusions about unusual situations. For a team that loses 2 starters, and 3 rotation players, the model probably does a very good job. But for a team that returns almost no players (Kentucky), there simply are not a lot of historical examples. I would like to have a statistical reason to treat Kentucky differently, but for now I am mostly making an out-of-sample projection.
Similarly, the three year data set causes problems because of some recent trends. In particular, teams without elite recruits have been having more and more success from 2007 to 2010. Let’s throw out Memphis, Gonzaga, and Xavier, because all three teams have been recruiting at a different level that most non-BCS teams. Look at the non-BCS teams in the top 35 of the Pomeroy Rankings in 2007 vs 2010:
22nd Air Force
28th Southern Illinois
25th Utah St.
29th Northern Iowa
34th Old Dominion
There are a lot more teams without elite recruits performing at a high level recently. (BYU’s Jimmer Fredette was not an elite prospect coming out of high school. Butler’s Matt Howard and Dayton’s Chris Wright were top 100 recruits, but they were the only elite recruits on these teams.) Thus the recent data tends to have more confidence in non-BCS teams than may be warranted.
But I also suspect this is somewhat cyclical. While the SEC fell off the map a couple of years ago, and the Pac-10 fell apart last year, I do not believe those leagues are permanently down-trodden. And as those leagues improve again, I think the recruiting data will start to have a little more predictive power, and I’ll start to rank a team like Wake Forest, a little higher than I do this year.
To deal with this cyclicality, I currently make an adjustment that moves non-BCS leagues downward. But I would like to have the data to determine the proper level for this adjustment. Right now, it is rather ad-hoc.
Finally, I want additional data so we can do a better job modeling how different coaches respond to different situations. We know Mike Brey has a special ability to teach offense; we know Bruce Weber has a special ability to teach defense; and we know Jamie Dixon has a special ability to bring young players along quickly. But modeling the interaction between coach and returning player effects will take more data.
Correction: If you have been following my blog closely, you may have noticed that Rhode Island showed up in the Biggest Departures Category in a recent post. That had me scratching my head. I knew Rhode Island lost Lamonte Ulmer, but the ranking seemed wrong. I recently went back and checked my code and found the problem. Even though I have the full rosters of eligible returning players, for some reason I included a line of code that classified all of last year’s “seniors” as departing players. Not only was this line of code redundant, it was also wrong. Many schools list players as seniors who are not really in their final year of eligibility. And Rhode Island had just this problem. Delroy James and Ben Eaves were both listed as seniors on kenpom.com last year, but both are listed on the Rhode Island’s roster again this year. I have now re-run the numbers for all conferences, and fixed the previous conference predictions. This only makes a meaningful difference for two teams whose numbers I have presented previously. First, I had mistakenly coded Notre Dame’s Ben Hansbrough as departing. With Hansbrough, Notre Dame is projected to be in the hunt for an NCAA bid. And given the way Notre Dame played without Luke Harangody last year, I think this is a very reasonable prediction. Second, Miami’s Adrian Thomas was also listed as a senior on kenpom.com last year. After fixing the code, Miami is now projected as an NCAA bubble team. I apologize for any confusion.
The first table shows the expected changes for the A10. Fordham has performed at such a hideously low level the last two years that it almost seems unsustainable for an A-10 team. Almost every player Fordham lost was among the worst in the conference, hence the positive number in the “players lost” column. Even for a team like Fordham, they should be able to replace players with efficiency ratings in the 70s with better options. Fordham will continue to be horrible this season, but with a batch of recruits that do not look like they should play in the MEAC, you have to expect at least modest improvement. Fordham will still be the worst team in the A10, but I suspect they will win more than two games this year.
Among the contenders, Temple is the most likely to improve. St. Louis was going to be the most improved team this year. They were a team that played a lot of young players last year and a team that was peaking at the end of last season. But Willie Reed and Kwamain Mitchell are not enrolled in school due to a recent legal issue. I have heard some speculation that at least one of them will return for the second semester, but for now I am assuming neither player comes back. And instead of being a 5th NCAA contender in the conference, St. Louis is another team that should slip back this year.
Thanks to the recent season-ending ACL injury to Brad Redford, Xavier is now expected to take the biggest fall in the A10.
The next table shows the expected changes in offense and defense. Xavier loses its two most prolific offensive options in Jordan Crawford and Jason Love, and both were very efficient as well. Plus they lose the great three point shooting of Brad Releford. While they return some other players who can rebound and defend, the model thinks Xavier’s offense will take a step back.
The next table shows the conference prediction. Temple is a logical favorite. They had one of the top defenses in the country last year, and while they lose a tough scorer in Ryan Brooks, the departing Luis Guzman was hardly an efficient player. But I am a little concerned that Temple may not have the depth to really get better. They gave a number of young players minutes last year, and outside the starting rotation, no one really stepped forward. In expectation, Fran Dunphy should be able to replace Brooks and Guzman’s production, but in practice I’m not sure where those replacements are going to come from. My model views a Villanova – Temple game as a toss-up, and I’m not quite as comfortable making that conclusion. But assuming Temple’s defense is better than Villanova’s defense, as it was last year, the teams should have similar efficiency margins once again.
Dayton loses a ton of players from their rotation. But their two most efficient and critical players, Chris Wright and Chris Johnson, are back. And some of the role players who are returning are also very efficient. (See Luke Fabrizius.) Dayton will depend on a solid recruiting class, led by Juwan Staten, to fill in the missing playing time. I’m a little concerned about integrating so many new faces given that Brian Gregory likes to play a deep rotation. But by leaning on the team’s two stars, Wright and Johnson, Dayton should be able to stay near the top of the A10 standings.
The A10 looks like a three-bid league, but Richmond is clearly in the hunt. At one time, St. Louis was also in the discussion for one of the top spots in the league, but the loss of two of their key players is devastating to their chances of becoming an elite team.
Sunday, October 31, 2010
Future Tweaks to the Model