This last football season was great, but I didn't blog during the season once. Between starting a new job (outside of football - maybe it was too painful to write about it?) and not having internet access at home during the season, I was a little behind. Instead of blogging during the season, I decided I was going to use this season, the first I since I started trying to rate DIII football, to refine and revise my model.
Turns out my original model was pretty good, but it had three main flaws:
Overfitting results for teams on the DIII island (notably in the NWC, SCIAC, ASC, & SCAC) and for conferences with only one non-conference game (OAC, UMAC)
Excessive variability in week-to-week ratings
The Pythagorean Expectation isn't the most predictive metric in DIII football
For teams who play in the West and South, the model over-valued the significance of their games versus teams outside of the island (because there were so few). What this meant was that the strength of these conferences from year-to-year and week-to-week was all over the place, which then resulted in poor predictions for their representatives in the playoffs.
The variability in ratings from week-to-week was also due to overfitting, but with the way I had set up my model, I didn't have a very efficient method for checking exactly how accurate I was over the course of a season. For my newest revision, being able to check my accuracy throughout the course of a season was a must.
Fair warning: the rest of this is pretty nerdy.
If you don't know what the Pythagorean Expectation is, here's a quick run-down - it's an equation that takes (roughly) the square of a team's total points scored, and divides that by the sum of the squares of total points for and against. The Pythagorean model is the one used by kenpom.com, which was what inspired me to start trying to predict DIII football. The biggest key difference between kenpom.com (a college basketball site) and my blog is the sport we're trying to model (and he has 66,600+ followers on twitter). In football, because of the lower amount of total point scored in a game, a dominant defense can have a much larger impact on a team's rating than a dominant offense can. This factor was also a detriment to my score predictions for games, consistently predicting lower total scores than the reality. This effect was especially apparent in the playoffs, where almost every team has an outstanding defense. As a former defender myself, I greatly wanted this to be the way things actually were (and after watching Von Miller in the Super Bowl, it's hard for me not to). As it turns out that a pure point differential method, which places equal weights on offense and defense, produces much better results.
After identifying my model's flaws during the course of the season, my new goal became the development of a more stable and predictive model with the ability to assess its own accuracy. What I ended up with is a sequential model. The two main differences between the model now and the model before are the manner in which ratings are updated and the calculation of the rating.
Each team enters a game with an AdjO & AdjD rating, from which an estimated score is calculated. Based on how each team's offense and defense performs on the field relative to expectations, the score is updated using the following formulas:
AdjO' = AdjO + k (ActO - EstO)
AdjD' = AdjD + k (EstD - ActD)
Where the k value was determined by what minimized estimation error over the last eleven seasons.
In contrast to my earlier system (and practically every other system I've found on the internet), this method only considers results from the most recent game to update its ratings. To understand what this means, I'll run you through a quick example.
This last season, UW-Platteville played Buena Vista first game of the year. BVU was coming off a string of relatively successful seasons. Because of their recent success, they started the season with a rating of 0.664, but finished the season rated 0.208. My model predicted a 36-13 Week 1 win for the Pioneers, and they ended up winning 49-13, improving their rating from 0.954 to 0.966. Had these teams played in Week 11 instead of Week 1, the predicted score would have been closer to 56-15, and a 49-13 result would have actually lowered Platteville's rating. It would make sense then, that as the season progresses and we learn more about each team, the impact of that first game should change accordingly, right? Well that's what I thought, too, which is why I also made a version of the model that did exactly that. The weird thing was, my standard error increased by nearly 15% in that model.
As it is now, my model has a standard error just a hair better than most rating systems' accuracy for Division I games and NFL games (via Prediction Tracker). I haven't yet found a prediction tracker for Division III results, so I can't be certain if the difference in error is due to the model's accuracy, or if Division III by nature is more predictable that Division I. Obviously I want to prove that I'm good at this, so I will be building such a system for next season.
This last year was definitely a year of learning. I learned that my first model wasn't very good, defense may not be that much more important than offense (I'm still holding out on that one), and that the idiom, "They were a different team at the beginning of the year," may actually be true. My hope for 2016 is to learn that I have vastly improved my ability of predicting DIII football games.