Those that have followed Lacrosse Reference for a while know that a lot of our inspiration for creating this site came from FiveThirtyEight. Those that follow FiveThirtyEight will be familiar with their reliance on the ELO rating system which underpins many of their sports-related models. We’ve always planned on implementing ELO for college lacrosse, and now, we’ve finally gotten around to doing it.
Secondary motivation for doing it now is that many of our analyses are starting to bubble up the question: “but who have they played?” To date, we’ve treated every game and every opponent as equal, which means that we’ve only been able to produce stats and tables that are opponent-agnostic. In practice, this has meant sticking to facts rather than making any assertions about who is better or worse. I still don’t plan on a Lacrosse Reference rankings, but having ELO in our back pocket will allow us to enhance many of our analyses to account for opponent strength. And we will continuously update our ELO rating table, which can be found here.
Since you’ll see it referenced going forward, I wanted to share a bit of background about ELO and how we’ve implemented it for college lacrosse. ELO is a system originally created by Arpad Elo as a way to rate chess players. Chess, being a fairly simple game in terms of statistics, needed a simple way to compare the relative strength of two players. Consider that the only game data available is win-loss-draw, the two players involved, and the date of the match. Simple inputs require a simple model.
The basic premise of the system is that in each match or game, ranking points are transferred from the loser to the winner. The number of points transferred is based on the outcome of the game, the likelihood of that outcome, the location of the game, and the “speed” at which the system updates its evaluation of a team.
- The “outcome of the game” in chess is win (1), draw (.5) or loss (0). When ELO is applied to sports with more possible score outcomes, the “outcome” input is generally the final margin, with increasing margins being discounted. A 6 goal victory will count for more than a 3 goal victory, but not twice as much.
- The likelihood of each outcome is derived from the two teams’ ELO ratings prior to the game. In general, a team with an ELO rating 100 points higher than their opponent is expected to win 64% of the time. If a heavy favorite wins the game by one goal, you would expect the number of points transferred to be very close to zero. If a huge underdog wins by 5 goals, you’d expect a ton of points to be transferred.
- The location of the game matters, as the home team is given whatever home field advantage exists in the sport. In lacrosse, in the past three years, the home team has won 56.5% of the time, which equates to a bonus of 46 ELO points for the home team.
- The “speed” of the system is a way to determine how much the model should update its evaluations based on a single game. In baseball, with a 162 game season, you have a lower speed because any one game doesn’t affect how you view a team as much as it might in football (or lacrosse). The downside of shorter seasons is that you have to use a faster “speed” which means that the model may overreact to a single game (kind of like a human).
To summarize, the two teams come into any given game with their ELO at that time. The home team is given a 46 point ELO bump to reflect home field advantage and then the probability of each team winning is calculated. From there, the result is compared against the final outcome and points are transferred from the loser to the winner, with the total dependent on the likelihood of the outcome and the final margin.
Statistical sidebar: when favorites win, they tend to win by more than they lose by when they lose, so there is a risk with ELO that the stronger teams will end up with inflated ELO ratings. (We’ve used the FiveThirtyEight approach to adjust for this bias.)
I’d be more than happy to provide a more detailed technical description of the model we’ve implemented if anyone is interested. Just email me.
Key Implementation Questions
The three main questions that we needed to answer were:
- How quickly should the model adjust to new games?
- How many ELO points should we apply for home field advantage?
- Should a team start a new season with last season’s final ELO or an adjusted ELO to account for roster turnover?
In trying to identify the right “speed” for our model, I took a bit of a shortcut here and started with the baseline number used in the NFL: 20. There are a similar number of games in an NFL season and a college lacrosse season, so we would expect the model to react to new information similarly. However, the variation in performance for NFL players is less than for the 18-21 year olds in college lacrosse, so instead of 20, we’ve used a speed factor of 21 to reflect the fact that performance will change over the course of a season and the model needs to react more quickly to new information.
For home field advantage, we went back through 3 years of final scores to identify the percentage of the time that the home team wins. We found it to be 56.5%. This is a bit of a rough calculation because it does only go back three years, but that is over a thousand games, and I’d expect the actual number to be fairly close. When you plug 56.5% into an ELO model, the resulting ELO point value is 46. So to incorporate home-field advantage, the model adds 46 points to the home team’s ELO score before calculating the win probabilities.
New seasons present a choice for anyone implementing ELO: do you carry over the quality of last year’s team or start fresh with each new season. The challenge with ELO is that is has no clue who is on the team, who the coach is, or what injuries a team has faced. It’s the obvious trade-off that you have to deal with in order to benefit from the simplicity of the model. So starting fresh at the start of a new season literally means putting every team at 1500 (the standard average ELO rating) and letting the season play out from there. Obviously, we know at the beginning of the season that some teams are better than others, so this would be absurd.
In the FiveThirtyEight implementations, they’ve taken the prior year’s final rating and regressed it toward the mean so that a 1590 final ELO rating translates to a 1545 ELO rating to start the next year. However, in college sports, you lose a quarter of your roster every year. In lacrosse specifically, the landscape of the sport is changing geographically, which adds another layer of uncertainty to forecasting any new season. So instead of regressing 50% of the way back to 1500, we’ve gone 2/3 of the way back. So in our model, a 1590 final rating translates to a 1530 rating to start the following season.
Hofstra – a team on the rise
Take Hofstra as an example. In 2015, they had a 5-9 record with several close losses and a big win in their second-to-last game against Towson. Not a complete disaster, but nothing to write home about. Fortunately, ELO is a system that is able to see a 5-9 record and adjust for the quality of the opponents and the final scores of the games. It all nets out to an “average” team, which we can see by Hofstra’s final 2015 ELO of 1516 (1500 is average).
In 2016, they finished with a 9-6 record, a significant improvement over 2015. And indeed, Hofstra ended the year with an ELO rating of 1556. But ELO looks at that record and sees several games where Hofstra, as a significant favorite (> 65% odds) ended up losing to lesser opponents (Fairfield, Georgetown). It also sees a few games where they barely squeaked out a win as a big favorite. The net result is that while their record improved quite a bit, ELO was slightly less impressed.
2017 is another story. As of late April, Hofstra is 11-1, with a lone loss against Drexel. ELO sees that loss as doubly bad ( at home, Hofstra was an 84% favorite ). The one mitigating factor was that it was a close one goal loss. Either way, it had a significant negative effect on Hofstra’s ELO rating. Good news for Hofstra is that ELO also sees 2-goal wins over UNC (ELO: 1517), Monmouth (ELO: 1659) and Princeton (ELO: 1630). Taken together, ELO sees a team that built up 10 consecutive wins, several over well-regarded opponents by multiple goals. Even while taking into account the bad loss to below-average Drexel, ELO comes away with the impression that Hofstra is one of the most improved teams in D1.
We’ve plotted the game by game ELO ratings for Hofstra over the past three seasons so you can see how they have really taken off in 2017.
Future of ELO
Going forward, we expect to use ELO in two ways. First, and probably most critical, is to adjust performances by opponent strength. As I’ve said before, the analyses we’ve published to date are just the facts; they don’t make many assertions about team quality, etc. But as we get into more team-specific analysis (i.e. our stuff on Hopkins), it will be very useful to make the analysis more precise by accounting for opponents specifically.
Second, and more for entertainment purposes, is to provide some estimate of ELO’s opinion of who will win a specific game. Since ELO ratings can be quickly translated into an expected win percentage for one team over another, we hope that these numbers will provide a little more context for the games (and results) as they happen.
We also need to come up with a lacrosse specific name for ELO’s implementation here. If you have any suggestions, use the comments.