The holy grail for every sports analytics community is a single player metric for value. The baseball SABR community has wins above replacement, basketball has plus-minus, and hockey has combinations of both. Football has... yards per play?
Defining Player Quality
"Quality" or value" on a football field is notoriously difficult to quantify, and even more difficult to assign to individual players. Topher Doll (@Topher_Doll) tracks the predictive accuracy of different metrics for the NFL, and one of the best efficiency metrics for predicting future games (and thus, a true sign of quality or skill) is Total Adjusted Yards per Play (TAY/P). One of the best things about TAY/P is its relative ease of calculation:
TAY/P = ([Yards] + 9*F[First Downs] + 11*[Touchdowns] - 45*[Interceptions]) ÷ [Total Plays]
Total Adjusted Yards per Play is meant for analyzing quarterback play, but it's doesn't require any conversion (other than removing interceptions) to apply to running backs and receivers. Analysis done at The GridFe and Football Perspective have determined that the "value" of first downs is worth the equivalent of nine yards, and the value of a touchdown is worth twenty yards (touchdowns are also counted at first downs in stat sheets, which is why the equation uses a coefficient of 11 for touchdowns).
One problem with using TAY/P to analyze players in the NCAA is that first downs aren't tracked for individual players, only for teams. To work around this issue, I made the assumption that a players rate of first downs is proportional to their share of a teams' total plays and yards. To help explain this, here's an example:
The University of Okoboji has 40 non-touchdown rushing first downs on the season. As a team, Okoboji has rushed for 2,000 yards on 400 attempts. The leading rusher for Okoboji had 1,200 yards on 200 attempts, so he had 60% of the team's yards, and 50% of the team's attempts. The average of his ratio of plays and yards is 55%, so he earned 0.55 x 40 = 22 non-touchdown first downs. Add in his individual touchdowns, and we should have a reasonable estimate for his total number of first downs, and can calculate his TAY/P.
Setting a Replacement Level
Now that we know a player's TAY/P, the next step is calculating his value above replacement. For baseball, replacement level is a fairly concrete, non-abstract distinction--it's the average level of performance for a player freely available on the open market, basically an average high Triple-A player, or about the 40th-best player at any given position (32 teams, plus bench players). For college football, though, there is no open market, and the talent disparity between the best and worst teams is much larger than in any professional league (especially in Division III), so any definition of replacement level is going to be necessarily arbitrary.
With about 250 teams playing Division III, a "replacement level" player at any position should be about the 250th-best player at that position. Because the talent disparity in DIII is so large, there should be a decent number of starters at lower-tier schools that perform below replacement level. I am drawing all of my statistics from the NCAA Stats site, which only lists the Top 200 players for any given statistic (such as rushing, passing, or receiving yards per game), so it's impossible for the "best" 250 players in TAY/P to be in my sample. It's also unlikely that all of the 200 running backs, receivers, or quarterbacks in yards/game would perform above replacement level, but a decent majority (say, 150 of 200?) should be above replacement level. A decent proxy to achieve this result is to set the replacement level of efficiency at 1/2 a standard deviation below average.
Just like how baseball uses positional adjustments for WAR, I'll be making similar adjustments based on the type of play. For passing plays, the average TAY/P is almost exactly 8.0, and the replacement level is 6.8. For rushing plays, the average and replacement levels of efficiency are 7.3 and 6.6, respectively. The highest passing TAY/P in the country (De'Angelo Fulford) is around 16.0, and the highest rushing TAY/P (Hunter Belzo) is around 13.2. Yards Above Replacement Player (YARP) for a quarterback and running back are pretty simple:
[YARP] = [Total Plays] * ([Player TAY/P] - [Replacement Level TAY/P])
For receivers, I didn't want to use yards per reception to determine player efficiency, because it drastically underestimates the value of high-quantity/low-yardage receivers--the Wes Welkers, Tavon Austins, and Quinn Buschbachers of the world. In fantasy football, the best metric to predict future results (which, again, is a good indicator of quality or skill) is yards per route run, so that's what I'll be using for my determination of receiver value. Just like with first downs, the NCAA doesn't track the number of routes run by a receiver.
This is where my methodology gets pretty deep in the weeds, and where I would welcome any outside input for ways to improve the calculation. I basically just threw my hands in the air and decided that a wide receiver runs a route on three-quarters of his team's pass attempts. On average, a team's top receiver catches around 43% of his team's completions, so I set a receiver's replacement level efficiency at 43% of the replacement level for quarterback efficiency, or 2.9 TAY/RR.
My estimate for the number of routes run by a running back is a bit more complex, but no less arbitrary:
[RB Routes Run] = 1/3*([Team Pass Attempts]/4) + 2/3*([RB Receptions]*2)
A running back's route share is heavily dependent on his offense's style of play, but he's probably not catching a pass on much more than half of his routes, and probably not running routes significantly more often than on a quarter of the team's pass attempts. The replacement level efficiency for running backs on pass plays is the same as that for receivers. Right or wrong, I don't know, but my results end up aligning very well with how coaches and SIDs voted for All-Conference and All-Region teams (which isn't by design, but I think gives credibility to my methodology).
Adjusting for Opponents
The last step to this puzzle is adjusting player efficiency for the quality of opponents faced. My Y1P+ ratings are basically a more advanced form of TAY/P, just on a scale of points instead of yards. Using a linear transformation, I can convert a team's opponents' defensive rushing and passing Y1P+ ratings to TAY/P. The equation for adjusting a player's TAY/P for their opponents is then:
[Opponent-Adjusted TAY/P] = [TAY/P] + ([National Avg TAY/P] - [Opponent TAY/P])
From there, a player's opponent-adjusted TAY/P (of TAY/P+ for short) is used to determined their value above replacement (YARP+). For my final determination of player value, I used YARP+ per game instead of just raw YARP+, because I didn't want to completely screw over SCIAC teams that were supposed to play Oxy this year. All of the stats used were from after the playoff quarterfinals, but before the semis, because I had to draw a cutoff somewhere, and it's taken me a few weeks to pull all of this data and set up the excel documents to do the calculations.
In a change from my usual MO, I'm going to be including NESCAC teams and players in my analysis. Because I don't have any way to adjust for opponents' since the NESCAC doesn't play out of conference, I'm not even going to try--I'll just assume every NESCAC team played average opponents.
What About the Offensive Line & Tight Ends?
Here comes another cop-out. I didn't know how to assign value to TEs/H-Backs, so I didn't even try. At Pro Football Reference, tight ends and fullbacks are assigned a portion of the offensive line's share of their Approximate Value. If/when I eventually include the position in my analysis, I'll probably use some sort of the same idea.
As for assigning value to offensive lines, I decided to look at a team's overall sack rate (expressed as adjusted yards per pass, or AY/P) and rushing success. I used sack rate instead of total sacks to account for teams like Geneva, who gave up 11 total sacks (around the 25th-best number in the country), but had a sack rate over 20%, the worst in the country. I also entered this analysis assuming that the offensive line is mostly responsible for the quantity of a team's sacks (which is at least partially erroneous), but not the depth of the sacks (probably not as erroneous). The average loss of yards from a sack is just slightly over 6 yards, so an offensive line's AY/P is:
(-6.1 * [Sacks]) / ([Pass Attempts] + [Sacks])
Just like with everything else, I then need to adjust for opponents. My Y1P+ ratings don't keep track of teams' sack rates, so I used a different methodology to determine opponent pass rush quality. Using the same methodology used by Ken Pomeroy to adjust for opponents, the equation for adjusted AY/P (or AY/P+ for short) becomes:
[OL AY/P+] = [OL AY/P] - ([Opponent DL AY/P] - [National Avg AY/P])
Using the same replacement level standard as for skill players (1/2 standard deviation below average, which works out to about -0.5 AY/P, or a sack rate of 9%), the offensive line's yards above replacement is:
[YARP+] = ([Pass Attemts] + [Sacks]) * ([OL AY/P+] - [Replacement Level AY/P])
I weight the offensive line's YARP+ by their total number of plays for a couple of reasons. The more important factor is because an offensive line's quality of pass pro is obviously impacted by their style of play. I mentioned Geneva above, a team that runs a triple option offense. Despite having the worst sack rate in the country (by a significant margin), the fact that they only passed the ball 50 times all year means they were "only" the 17th lowest OL in terms of yards above replacement.
Determining an OL's rushing value is a little more "math-y," so buckle up. At Football Outsiders, they use a stat called Adjusted Line Yards for offensive line's rushing effectiveness. On each play, the offensive line is assumed to be responsible for:
120% of yards lost
100% of yards from 0-4
50% of yards from 5-10
0% of yards from 11+
I don't have play-by-play data to do the same sort of analysis, but I want to use a similar concept to adjust a team's rushing TAY/P. In a context-neutral scenario, most teams I know view rushing plays as "successful" when they go for 4 yards or greater. The replacement level efficiency for rushing TAY/P is 6.6, it seems reasonable that anything above 6.6 TAY/P is successful, and anything below is unsuccessful. Just like how Football Outsiders awards diminishing returns to OLs for explosive runs, I too want to award diminishing returns for teams with a significantly higher than replacement level TAY/P. To do this, I use an arctangent function centered at 6.6 to transform Team Rushing TAY/P into an Offensive Line Rushing TAY/P. The result looks like this, where there's a 1:1 relationship for efficiencies near 6.6 TAY/P, with upper and lower limits at about 8.0 and 5.2, respectively:
The next steps for assigning value to OL's for rushing is adjusting for opponents, and weighting the resultant OL TAY/P+ by the total number of rushing attempts. Add together the offensive line's passing YARP+ and its rushing YARP+, and you get their total YARP+.
And just like that, I've explained to you how I calculated Yards Above Replacement for offensive players in just under 2,000 words. Tune in later this week/weekend for Part 2, where I explain the methodology for defensive players.