I’ll never forget my first Sloan Sports Analytics Conference. It was an event I’d wanted to go to since I first heard about it as the nerve center of the sports analytics world. I walked the halls with people whose articles and algorithms I read religiously. But that is not what made it memorable. No, what made it memorable was the horde of undergrads following Jeff Luhnow around like he was the Pied Piper.
I don’t get paid by anyone to do sports analytics, so I suppose it’s a bit presumptuous for me to write a post about getting a job in sports analytics. But I have worked for many years in the analytics space more broadly, and I know a thing or two about labor markets.
So if you are an aspiring GM, I hope these tips can help you to pursue your dreams in the most efficient way possible. And if you are Jeff Luhnow, I hope this will help you avoid paper cuts from resumes whipped at you by the horde.
Do some analytics using publicly available data
Go analyze it. Like right now. Finish reading this post, subscribe to our newsletter, follow Lacrosse Reference on Twitter, and then go do it.
The most disappointing thing about Sloan for me was the lack of novel analytics. The student presentations were great because they demonstrated new ways of looking at old things. But the teams, the ones who actually pay people to do this stuff, were conspicuously quiet. Which makes sense; why tell the whole world, and all your competitors, what you are doing to gain an advantage?
And it was even more disappointing because the people that those teams employ are, in many cases, the ones that advanced the field in the first place. John Hollinger is a great example, but I can point to a dozen people in front offices who were hired because they did great work in their spare time. Such is the draw of doing analytics within the front office that most teams simply hired the best amateur analytics minds because they were the ones doing the best work. For their part, I suspect they wanted to work for the teams because they could be involved in decision-making and have access to internal-only data sets. Certainly nothing wrong with that.
So to sum up the history of sports analytics: 1) nobodies made a name for themselves using public data and publishing online and 2) teams hired said nobodies. If you aren’t on GoDaddy right now registering a domain name, I have no more advice for you. Do some analytics, write about what you did, post it on the internet. If you create interesting stuff, people will see it, and if you are dogged/clever, the right people will see it. And if it’s going to help them gain an edge against their competitors, they will hire you. (How’s that for grossest oversimplification ever?)
When I’ve hired people into analytics roles (albeit not in sports), I didn’t want to hear about degrees, GPAs, or other resume elements. I wanted to hear about side projects that demonstrate passion and capability. Publishing your own work hits both of these boxes.
(And be thankful that you are dealing with such clean data (1))
Focus on storytelling and writing skills, not machine learning
Don’t get me wrong, machine learning is great. And deep learning is even greater. Powered by a huge explosion of annotated data, these techniques enable truly awesome new capabilities. But you have to be smart about when to apply advanced statistical techniques and when to use…algebra. I’d bet that 99% of the sports analytics that will confer any sort of competitive advantage are basic averages and regressions. We are simply not trying to solve the sort of classification problems that the more sophisticated techniques excel at (2).
So if not advanced statistical modeling, what is the skill set should you hone? The simple answer is storytelling.
Throughout my consulting career, the most important lesson I’ve learned is that the success of an analytic is wholly dependent on the story that it is packaged in. Put another way, I’d rather have a simple, less insightful, model that is packaged in an effective story than a model that nails some sort of prediction but can’t be communicated easily. Unless you are already the GM, you are going to have to convince someone to act based on the insight that your models have generated. You cannot do this unless you are an effective storyteller.
Unfortunately, this is not something that you can take a Coursera course to learn. Storytelling is the poster-child for soft skills. Fortunately, it’s something where practice makes perfect. Explain to your grandparents what PER or WAR means. Think about how you would translate analytical content to someone who didn’t know what numbers are. Draw a sketch representing an analytical concept. Incorporate analytical insights into James Harden fan-fiction.
Just don’t assume that the people you need to convince will inherently understand what you’ve done, even if it seems obvious to you.
Make sure you are solid with SQL, Python, R, and some sort of data viz
Statistical modeling is the sexy skill (how crazy is that?) within the analytics space. That’s how people build self driving cars and beat world champions in Go. But successful analytics have to stand on three legs: data, algorithms, and storytelling/visualization. In truth, you can get away without having sophisticated algorithms, but you can’t get away with a poor data foundation or a poorly packaged insight. Don’t skimp on the plumbing of your analysis!
And unless you are joining a massive data organization, you’ll probably be responsible for all three things. They want you to be Ben Zobrist, not Mark Trumbo. That could mean using SQL to store and clean data on Monday. Then writing an R model on Tuesday. Then taking the results of that and using Python to format the results into a Tableau-ready input file so that you can build a visualization dashboard on Thursday. And you’d better be able to 1) do all those things or 2) be familiar enough that you can figure them out via StackOverflow.
I’m not suggesting that you must be an expert in all of these areas, but you have to be competent enough to know how to tackle a given problem or requirement. If you are someone who has a strong background in modeling, you should focus your learning cycles on the data engineering piece or the storytelling piece on the backend. If you are a data engineer, make sure you are considering whether there is a better algorithm out there relative to you hacking something together. And as we discussed above, don’t sleep on the importance of packaging your insights into a story that makes them easy to digest and act on.
In other words, to get a job in sports analytics, you’ve got to be a jack-of-all-trades. I realize that is still somewhat vague. And that is why I have started the Sports Analytics Academy. If you think you would benefit from some help in getting started, check it out. There is also a private slack channel for those who want to be part of a community focused on the same goals.
- The biggest challenge in consulting projects that include modeling is data availability/quality, by far. Even when the data is system generated, it’s often unclear what each field means. Sometimes they give you a data set that is only a subset of the population. Or the company changed how they collected the data half way through. Awareness of these pitfalls is crucial to being an effective advisor because if you ignore them, it’s easy to create an analysis that is misleading. But you, aspiring data-GM, are lucky because sports data doesn’t typically suffer from these same issues. For one, almost every relevant piece of game information is posted on the internet, where millions of people can alert providers to data quality issues. The simple fact that the data is public is a huge advantage because you don’t need to ask permission to analyze it. (You’d be surprised how unhelpful data people are when they think you are upstaging them.)
- One of the most frustrating things about deep learning, and machine learning more broadly, is the requirement that you have a lot of annotated training data to teach the models with. To be fair, I’m not frustrated with the techniques, but with the fact that people consistently gloss over this requirement when they talk about using deep learning in models. Facebook can develop these great models because they have a decade and a half worth of data from billions of users uploading photos and tagging them. Google was able to make advances with their speech recognition because their models are amazing, but also because every time they got a dictation wrong, users would correct it before hitting send. You shouldn’t say yes when someone asks you whether machine learning could help predict something unless they are also committing to providing the money or people/time to annotate a sufficiently large data set.