__I recently added a Composite Rating page to the site. Aggregate analysis is often usefulÂ and more accurateÂ than any one method on its own, so I figured I would use this post to explain some of the methods and discoveries of my analysis.__

__How the ratings are determined__

__Â __

__Every rating system uses a different scale to rate its teams, so the first challenge in making aÂ composite ranking is determining how to compare these different systems. What I've seen in the past the geometric mean (similar to the average that you're used to) of each team's rank. This method is perfectly acceptable for ranking the teams, but I wanted to determine a composite rating onÂ the same scale as my system. The difference between an average rating and ranking are subtle, but meaningful. When several teams are rated at nearly the same level, the difference between their ratings is small, but the difference between the rankings could be large.__

__Â __

__In order to average teams' ratings, I had to perform a transformation to place each system's ratings on a 0-1 scale. Every system I was analyzing distributed their ratings in approximatelyÂ normal distributions, so I was able to use Excel's normdist() function to place each team on a 0-1 scale. From there I averaged each team's rating for every system to find their composite rating, excluding their maximum and minimum ratings (also known as an olympic average). By using the olympic average, I hopefully was able to minimize any outlier values.Â The math was pretty simple, but cleaning up the team names was a nightmare. Everybody uses different names for nearly half of the teams inÂ the set (Wis. Whitewater/Wisc.-Whitewater/UW-Whitewater for example). Can't we all just use the same team names that I use?__

__Similarity Scores__

__Â __

__Next, I decided to check which systems were most similar to the aggregateÂ and to each other. First I found the absolute difference between each system and the system aggregate, and then I averaged those values. The inverse of that average is their similarity score. A quick primer on interpreting these values: a similarity score of infinity means that the systems produce exactly the same ratings, and a score of two is as dissimilar as possible on this scale (because every rating is going to produce a linear distribution). As the value increases, the two systems become more similar. A score of 10 means that the two systems have an average difference between teams of 0.100 (or essentially one win per season). Here'sÂ how eachÂ system compared to the average:__

__Â __

__For comparison of just how big of a difference there is between a score of 39.9 & 10.5, here's a side-by-side scatter plot of the Laz Index and CSL Ratings against the system average.Â __

__Next I decided to compare the individual rating modelsÂ to each other to see which models produced the most similar results. What I discovered was that the Laz Index and Maas Ranking are the two systems most similar to each other, with a similarity score of 67.1. For some context, the next highest similarity score was 39.6, meaning these two systems are nearly twice as similar as the next two closest. That can't be strictly coincidence. SoÂ I graphed these two systems against each other like I did above:__

__Notice how in the first graph the points are actually scattered above and below the x=y line, but here the points follow a smooth trajectory. This suggests that the Laz Index and Maas Ranking are probably using the same method to rank teams. Martien Maas has a detailed description of his methodology on his site, but there's no such description on the Laz Index site. If I were a betting man, I would wager that they're using the same formula, with some slightly different input methodology.Â __

__Â __

__Next I found the two systems that are most dissimilarÂ from each other, and it happened to be the CSL Ratings and the Nutshell Redrodictive, with a similarity score of 6.4. Here they are plotted against each other:__

__It would be hard for these two systems to be any different. As I mentioned earlier, the largest possible expected difference between any two systems would be a similarity score of 2. Both Ray Waits (Nutshell) and Craig Loest provide descriptions of their methodology, so it should be possible to determine why they produce such different results.__

__Â __

__The Nutshell Retrodictive description is as follows (emphasis mine):__

__Â __

The Nutshell Ratings are based on two components:

1. Margin of Victory

2. Upsets

Margin of victory is the number of points one team beats another team by. An upset is when the underdog beats the favorite. Each of these factors is treated in a different manner. Upsets award the underdog, but not as much as if the winner had been the favorite. There is no limit on the margin of victory since it only counts as a small portion of a team's rating. When an underdog wins, it does not gain a greater rating than the team it beat. An underdog is expected to earn its higher rating either by margin of victory or by beating more favorites. Once the team is the favorite it will gain points more quickly.

__Â __

__Short and sweet; I like it. The CSL description is slightly longer, so I won't post the whole thing here, but here's the important points:__

__Â __

__Factors:__

__A team's wins minus that team's losses__

__The sum of the wins of the teams I beat minus the sum of the losses of the teams that beat me__

__The sum of the wins of the teams beaten by the teams I beat, minus the sum of the losses of the teams that were beaten by the teams that beat me__

__Â __

__Clearly, a win against a team with a "good" record will supply more of a reward in the second and third factors than win over a team withÂ a "poor" record.Â Similarly, a team is not penalized very much for losing to a good team, but is penalized more punatively for losing to a poor team.__

__Â __

__Each factor is weighted evenly and then summed.__

__Â __

__In reading the descriptions, it becomes clear why they produce such vastly different results. Nutshell includes MOV, while CSL doesn't, and both descriptions explicitly mention how an underdog win would affect their ratings, which are completely opposite.Â __

__Â __

__When I read the CSL description, I was struck by how similar it is to both the NCAA's Strength of Schedule calculationÂ and DI college basketball's RPI, which makes sense considering the teams that are the biggest outliers in the system. The same quirksÂ that I pointed out in my critique of the NCAA's SoS are likely to plague the CSL Ratings.Â Look back at the first plot of CSL versus the system average. The teams CSL overrates the most are teams from conferences like the UMAC, MWC, and ECFC (bad conferences), while the teams it underrates the most are teams from conferences like the WIAC, OAC, MIAC, andÂ E8 (good conferences).Â __

__Interestingly, my system was most similar to the Born Power Index, Laz Index, and Maas Ranking -Â three of the systems most similar to the system average -Â but my ratings are one of the most dissimilar from the system average. I'm not sure how to feel about this, but I think it's probably a good thing. I set out to make a unique rating system, which I think I did, but usually when you're much different from the aggregate knowledge of others, it means you're probably wrong.Â __

__Â __

__I guess we'll have to wait until next season to find out.__