How to win friends…and get on ESPN
ESPN recently released their lacrosse schedule for the 2019 season. Excluding the supposed MAAC flex game, there are 23 D1 Men’s contests scheduled to appear on the network’s various properties.
I was curious to see what the games had in common and if there was any underlying logic that could help us understand why certain games were picked and why others were not.
In short, if you were miffed that your team didn’t appear enough on the calendar, consider this your guide toward righting that wrong.
The tale of the tape
First, let’s just examine the broad brush demographics of televised games vs untelevised games. I have selected several different variables that we use to compare two games:
- Combined Twitter Followers
- Combined Pre-Season Ranking
- 2018 Combined Winning Percentage
- 2013 – Present Combined Winning Percentage
- Average Lax-ELO Rating
- Best Lax-ELO Rating
- Best Winning Percentage 2013-Present (between the two teams)
- Best Winning Percentage 2018 (between the two teams)
- Most Twitter Followers (between the two teams)
- Relative ranking (by Lax-ELO) against games on the same day
- Relative ranking (by Pre-Season ranking) against games on the same day
The idea here is to try and differentiate between a few sets of factors. The social media factor is really about the fan base; idea being that teams with larger fan bases are going to have larger Twitter followings.
Compare that with the metrics around team performance. By using Lax-ELO we can get a sense of where each program slots against all the others. The winning percentage metrics show us a snapshot in the case of 2018 and an alternate way of looking at program success in the total win percentage.
We also have the relative ranking compared to the other games scheduled on the same day. This is meant to try and control for the fact that ESPN is filling slots each week/weekend and they can’t really control the quality of the games on offer.
Lastly, we broke out some of the variables to look at the better team in isolation. The combined metrics may be more useful, but it was worth entertaining the idea that they may pick based on the quality of one of the teams versus the collective match-up.
The most basic thing that we can do here is compare the 23 games that were selected against the 469 scheduled games that weren’t.
Variable | Avg in ESPN Games | Avg in Other Games |
---|---|---|
In-day Ranking by Pre-Season Ranking | 2 | 12 |
Combined Pre-Season Ranking | 10 | 37 |
In-day Ranking by Lax-ELO | 4.2 | 12.3 |
Combined Twitter Followers | 50,035 | 18,735 |
Highest Twitter Followers | 32,690 | 12,977 |
Combined Lax-ELO | 1627 | 1493 |
Higher Team Lax-ELO | 1683 | 1566 |
Combined Win % 2013- | 63% | 47% |
Highest Win % 2013- | 69% | 55% |
Combined 2018 Win % | 65% | 46% |
Higher 2018 Win % | 72% | 58% |
Obviously, and unsurprisingly, they picked the better teams, with larger followings, playing in the better games of the day. The question is how they balanced those factors.
Second-guessing Bristol
What fun would this post be if we didn’t give you the tools to second-guess what the execs up in Bristol chose for their spring slate. So we are going to do a quick detour to poke holes in their master plan.
March 12 is our first chance to really kick the tires here. ESPN is showing Mount St Marys vs Johns Hopkins. Before the Towson shellacking, this match up featured the #7 ranked team (per IL) and the #57 team in the country by Lax-ELO. The crew is probably going to stick around Baltimore because they are also showing the Jays 4 days later. That may explain the choice on the 12th.
Unfortunately, that Tuesday also features the following games:
- #13 Lehigh at #12 Rutgers
- #16 Bucknell at #15 North Carolina
- Marquette at #11 Robert Morris
- Utah at #2 Duke
Either Hopkins rights the ship at that game ends up being a blow-out OR we get a non-thrilling slap-fest between two mediocre (at best) teams.
March 31 is another head-scratcher. #9 Penn State is in College Park to take on #3 Maryland. The way Penn State is going, that could easily be a #1 vs #2 match-up by then. Instead, we are going to be treated to #12 Rutgers at #16 Ohio State.
April 14 is our last questionable ESPN decision. The scheduled game is #16 Ohio State at #7 Johns Hopkins. Which means we are missing out on #3 Maryland at #12 Rutgers and #5 Cornell at #11 Notre Dame.
I get that Notre Dame would be a trek for the crew, but Terps at Scarlet Knights would be novel and is likely to be a better match-up.
Someone needs to explain why Johns Hopkins is featured in 7 out of the 23 ESPN games. Do they have a really awesome lacrosse booth? Are they allowing ESPN to keep their equipment set up all year long? Who am I kidding; I’m from Baltimore, of course the ESPN crew wants to spend their entire spring in Charm City.
Just to contextualize it, the Jays are featured in more games than Maryland, Loyola MD, Towson, and Denver combined. That just doesn’t add up.
I am not a data scientist
Anyway…
To try and suss out the factors that might have led to a game being picked over another, we turned to logistic regression since it’s typically used (the data scientists tell me anyway) when your dependent variable is a yes/no. Obviously, whether a game is being televised is pretty black and white (unless you are subscribing to the streaming services, am I right???).
(I used the steps from this tutorial because it was the first link that Google returned…)
As with any regression, my primary goal here is to compare the various attributes of a game to determine which of them are more useful if my goal is to predict whether a game will be televised on ESPN.
Step 1 was to try and figure out which of my variables were useful and which were not. Running the initial regression showed me that the following variables were not useful enough to keep:
- 2013 – Present Combined Winning Percentage
- Average Lax-ELO Rating
- Best Lax-ELO Rating
- Best Winning Percentage 2013-Present (between the two teams)
- Best Winning Percentage 2018 (between the two teams)
- Most Twitter Followers (between the two teams)
- Relative ranking (by Lax-ELO) against games on the same day
Right away, we can see one thing clearly: all the variables related to the “better” team in a given match up were useless. In other words, ESPN wasn’t looking to get the best “team”, they were looking to get the best “match-up”. Probably obvious, but still worth pointing out that the regression backed up the conventional wisdom.
One complicating factor here is the TV deals that various conferences have. Of the 23 games, there are only 4 featuring any Ivy Leagues teams. Defending champion Yale only has one slot on the ESPN schedule. I haven’t read the contract that the Ivy League has, but I suspect it precludes ESPN from cherry-picking the league’s best games. Same probably goes the the Patriot League.
The last point I’ll make about the variables that were removed is that the winning percentages were useful, but only if you limit it to last season as opposed to the last 6 seasons. So a team that has had sustained success in the past may miss the cut if their previous season did not live up to a program’s expectations.
Towson comes to mind here as a program that has had success recently, but who also had a down year in 2018.
Time to Predict
The next step was to train a logit model on a set of the data (70% in our case) and use that model to try and predict the remaining 30% of the games. If the problem is predictable, we should be pretty good at figuring out whether a game will be on ESPN.
And it turned out that the basic model we came up with was not predictive at all. We correctly identified 136 games that were not on the schedule. We also did not incorrectly guess that a game would be on the schedule when it ended up being there. That is the good news.
The bad news is that of the 12 games in our test set that were on the schedule, we only guess that one would be televised. That means we only correctly identified 8% of the televised game.
I’ll reiterate, my intent here was not to build a predictive model, and clearly I succeeded in not achieving that. My guess is that including some combination of location information to the model might help. Denver is way out west, for example, but has a lot of the factors that we would expect in a televised team. I would bet that our model got confused by all the times it thought Denver should have been televised but wasn’t because they are too far away from everyone.
Also, this is a special challenge with data science (and life I suppose); finding a needle in a haystack is hard. We have 492 games on the schedule this year, and only 23 of them are on the ESPN schedule. Building a model to predict such a remote outcome is always going to be hard.
When in doubt, backup
Even though our predictive model failed, I still think there are valid take-aways when you back up and look at the ESPN slate in aggregate. If you look at the variables that did have the strongest correlation to “being televised”, they were:
- The combined pre-season ranking of the two teams relative to the other games that day
- Combined Twitter followings of the two teams
They took the best games (as predicted by pre-season rankings) and then of those, took the ones that are associated with the largest fan-bases in the most east coast cities.
If there is something I wish they would have done differently, it would have been to rely less heavily on pre-season rankings. By doing that, they probably end up weighing the most passionate fan bases more than they should, at the expense of teams outside of the mainstream who are strong teams. Pre-season rankings are always going to favor the blue-bloods; that’s just life.
A mixture of something like ELO or final season SOR, along with the pre-season rankings could conceivably reduce the risk of a stinker match-up that was hyped pre-season because of over-zealous rankings. (I’m looking at you Ohio State vs Johns Hopkins on Apr 14.)