Adjusting Lax-ELO to incorporate new games
If you read our stuff this year, you’ll remember that about half-way through the season, we introduced Lax-ELO. It was an attempt to apply an ELO-based model to lacrosse as a way to compare the relative strength of two teams. Once it was created, we used it to help provide context for individual statistical performances and as part of our previews of upcoming games. Now, with 2017 done, it’s time to relook at our Lax-ELO implementation.
If you would like a more in-depth look at our implementation, the link above is a great place to start. But the main tent-poles in the model include:
- The winner of any game takes a certain number of “strength points” from the loser.
- The number of points transferred depends on the incoming strength of the teams and how unlikely the outcome was. More unlikely outcomes, in either direction, mean more points transferred.
- When using ELO to calculate win expectancies, the home team gets a modest bump to reflect home-field advantage.
- The system is calibrated with a “speed” element that determines how much it should react to the results of single game outcome.
- At the end of the season, you “reset” the values back toward, but not all the way to average to account for roster turnover
- There is a small adjustment that is made in sports where the final margin is a score, not a win/loss/draw outcome. This is made because ELO models would otherwise bias toward the favorites too much.
A note about ELO and win odds
One of the benefits of ELO is that you can use the ELO ratings of two teams to calculate the likelihood that either team will win a single game. To contextualize, a 100 point ELO favorite is expected to win about 64% of the time. I often used this calculation to provide a pre-game look at potential outcomes. Something like this for the two national semifinals:
There are two ways to evaluate a model like ELO (or Lax-ELO). One is to use the ratings as a prediction and calculate how many games it “picks” correctly. The other way is to look at the ratings in aggregate to see the whether model’s “picks” are correct enough. We believe that the second method is better. But it means thinking about the ELO model like this: “if the model identifies 100 teams that are between 60 and 70% favorites, it should be wrong about the winner 30-40 times.”
This may sound nit-picky, but it’s an important distinction in calibrating a model like this. When Lax-ELO says that a team is a 90% favorite, it doesn’t mean that we are 90% sure that they are going to win one game. Instead, it means that if those two teams played 10 times, we’d expect that team to win 9 of them. Conversely, if a team is a 55% favorite, and they played their opponent 20 times, we’d expect them to win 11.
Fundamentally, a higher ELO rating doesn’t mean we think the team is going to win a given game. It means that we expect them to win more games than they’d lose against a specific opponent.
Offseason summer-cleaning
The end of the season presented two opportunities that led to this post. The first, is that because the season is complete, we have more games to tune the model. Going into this season, we had 2015 and 2016 games in our database. By adding 2017 games, the model has more data to train on. That should mean a more accurate model overall. (By contrast, baseball’s ELO models have games in them dating back to the 19th century!)
The other opportunity is some downtime to go back and capture the play by play data for the 2014 season. Just finished that the other day.
The net effect of all this is that our model has twice as many games to work with.
So about that re-calibration…
To re-calibrate with all our new data, we used a fairly simple process repeated thousands of times.
To start, we picked some settings to test out. There are 2 settings that we could tweak: in-season speed to incorporate games into rankings and the degree to which the model discount previous seasons. For the sake of explanation, lets just pick 28 for our in game speed and 50% for our previous seasons discount.
For reference, the general speed factor used in a sport like football is 21. So a setting of 28 would mean that the model would in effect, overreact to single game outcomes, at least relative to how these models generally are built for football. The 50% discount means that if a team finished Year 1 with a Lax-ELO rating of 1600, they would start Year 2 with a Lax-ELO rating of 1550.
The next step is to go through each game in our database and calculate what the Lax-ELO ratings would have been for both teams given those settings that I just picked and the sequence of games they’d played up to that point. Since each game has a pre-game win expectancy and a final score, we can see how well our model does.
The method here is to put each game into a bucket based on the pre-game chances that the home team would win. For our calibration, we used 20 buckets. If the home team had a 3% chance of winning, then that game goes into bucket #1. This bucket holds all the games where the home team had between a 0 and 5% chance of winning.
Next, we check whether the home team won. In bucket #1, we would expect between 0 and 5% of the home teams to have won. In bucket # 11 (50 – 55% home win odds), we would expect the home team to have won between 50 and 55% of the time. (Noticing the pattern?) When we measure how often they actually win, we can subtract our expected result and that gives the error. By summing up our errors for all 20 buckets, we can get an overall idea of how well these settings (28 & 50%) did.
Now repeat this for 6,300 different combinations for those two settings. That gets us to a single answer for the question: “which settings minimize our error rate?”
The new model will be faster in-season but slower to discount previous seasons
So what do our thousands of calibration runs tell us? The headline is that the model wants us to be more aggressive. That means reflecting the results of individual games and using less of a discount for prior seasons. The charts below show the top 100 runs (by lowest total error) and what they suggest in terms of speed and discount rate.
This season, our model used a speed factor of 21 and a discount rate of 67%. In other words, our current model is much slower to reflect new results but too quick to discount a team’s previous seasons. At least according to this calibration exercise.
That is not to say that the current settings are wrong. Partly because a calibration approach like the one we’ve used is still subject to sample size concerns as well as the statistical concept of over-fitting.
Instead, I look at our current model as a conservative implementation. We were slow to incorporate new information but quick to discount prior seasons. That strategy is going to result in an ELO distribution that is much more bunched. Teams will be closer to 1500, which is the average ELO rating, and this will result in much more conservative win expectancies on the whole. For the initial implementation, I believe that this was the right decision.
So what’s next?
As a result of this calibration, one thing is clear: we have some room to make our model faster in-season and slower to discount previous years results. I have not determined how much we will move in that direction yet. We will certainly not take these calibration results as the right answer.
But they have helped to show that our model could be made more aggressive without compromising accuracy. And that is a good thing. We create, we test, we iterate, we refine. Slowly but surely, our lacrosse analytics will improve.