If the 'posts' figure in my newsletter dashboard is to be believed (and discounting a couple of subscriber survey requests) this is issue 100 of the Get Goalside football analytics newsletter. It started just over three years ago, in early March 2019, and over 1400 subscribers later, here we are.
It's been interesting to think back to that time and how much football analytics has changed since then.
When Get Goalside #1 dropped, we were still two months away from Liverpool's research department getting their shiny feature in the New York Times magazine. The general footballing public were less skeptical of data than they had been, but it wasn't yet exciting to them. A lot of people were putting out possession value models.
There's a lot of things I've learned in these past few years, but I wanted to open this issue out to more than just my perspective. I asked a bunch of people what was something that stood out to them that they'd learned over the last few years, and whether there was something in particular they'd want to learn over the next few.
I'll scatter some brief thoughts throughout, but for now I'll leave you in the expert company of: Javier Fernández, Lydia Vandenbergh Jackson, Jan Van Haaren, Sam Gregory, Arielle Dror, Joris Bekkers, David Sumpter, and Devin Pleuler. My thanks to each of them.
Senior Data Scientist at Zelus Analytics, currently focused on the development of cutting-edge performance metrics for football and basketball. Formerly, head of sports analytics of FC Barcelona where he built the club’s top-level analytics department from scratch. Javier holds a MSc and a PhD in Artificial Intelligence, both centered on football analytics.
xG has made it to the clubs, the press, and even the TV. It is not strange anymore to find an opening for a data scientist at successful clubs, many of those not really knowing what they need, but sensing they have to be part of it. However, it is very easy and tempting for the data analyst to settle right now. We are at the risk of creating and immersing the sport in a new and always invisible bubble, the xG bubble.
We are right now in the most exciting era for football analytics. Both the current football analysts and the bright analysts-to-be have the opportunity to make an immense impact in the sport they love. However, for doing that, we have to make a great renovation in three fundamental aspects of analytics. We need a shift in mindset, a shift in focus, and a shift in resources.
Mindset: In this new era, the football data scientist needs to gain confidence. You need to feel that you can sit at the same table as the sporting director, the head coach, the scouts, or the players. And the reason is simple, you have precious things to add to the game. So we have to embrace this and work hard to demonstrate it. It will pay off.
Focus: We can't settle with aggregating on-ball stats or reducing everything to xG shooting metrics. Don't settle with the obvious. Keep curious and focused about what should be the most important: "understanding the game better." At the same time, we need to identify where we can add the most significant value. There is tons of value to add in bringing the best players and getting to the best players (yes, players love any piece of information that can make them 1% better). The head coach is probably not the one you will help the most.
Resources: There are hundreds of smart people trying to make the sport better by sharing their work and ideas in blogs, newsletters, Twitter accounts, public posts. Resources are better than ever but are still scarce. This sport has an incredible opportunity of becoming even more popular and even more enjoyable. Organizations will benefit immensely if they share more data; analysts need to prepare more and better use the data. We will all grow and enjoy more. xG is great. But football is not simple. Don't settle.
Lydia Vandenbergh Jackson
Former professional soccer player and now Analytics Engineer for Zelus Analytics
As a former professional soccer player and collegiate coach, I’ve realized how much my experiences have shaped how I view the game and rate players. An amazing aspect of data analytics is that it isn’t (typically) as biased as I am! We can find amazing insights and make better decisions when using data. Growing up, soccer was always about goals and assists but now with advanced metrics and predictive models, we can evaluate all players on the field more effectively.
Now that we have all of this data, we have to translate it so that key decision-makers understand it. It has to be simple and presented in a way that makes sense to how coaches/players think. Trust is key here. It should be a two-way street with the opportunity to provide feedback. Feelings and intuition are important. Being able to relate to certain moments or scenarios due to my experiences helps me excel in my current role because we have to look at every problem or question from a holistic perspective and take into account all the things we can and can’t measure.
Mark: Something I took from both Javier and Lydia's responses are how they talk about communication. When I was starting out in the analytics space, the focus seemed more about 'these are ways you can win over the coach'. This might have made sense at the time, but feels like it's focus on a narrow goal framed communication as something quite transactional.
The word 'trust' stuck out to me in particular: coaches and players need to trust that you know what you're talking about, but also, considering how new analytics is, I think that they need to trust that they can safely be inexpert about statistics around you too.
Jan Van Haaren
Jan is a Data Scientist at Club Brugge, where he is involved with recruitment analysis, opposition analysis and performance analysis.
The football analytics community is extremely scattered and disconnected. Fanalysts, football clubs, analytics companies and academics are operating in the same space, but they have different interests. The fanalysts often want to land a job with a club or company, clubs want to outsmart their competitors and win trophies, companies want to earn money, and academics want to publish papers. However, fanalysts, football clubs, analytics companies and academics often do not know about each other's ideas. Fanalysts use social media and blogs, academics publish papers, companies sell commercial products that often lack transparency, and football clubs tend to be very secretive. As a result, people keep reinventing the wheel, which hampers the progress of the field as a whole.
More generally, football analytics is becoming more and more of an engineering discipline. The complexity of the data, models and metrics is ever-increasing. The early datasets (e.g., matchsheet data and basic event data) were easy to work with and the early approaches (e.g., expected goals) were quite straightforward to replicate, but those times are pretty much gone. Working with contextualized event data (e.g., StatsBomb 360 Data) and tracking data, and leveraging more sophisticated approaches (e.g., pitch control, expected possession value) requires more advanced and more diverse skills.
Wishes for the future
I'm hoping for a "better-integrated" football analytics community with tighter relationships and more exchanges of ideas among the fanalysts, football clubs, analytics companies and academics. I understand that each of those groups has to protect their own interests, but I'm confident that more interaction should be possible regardless. The companies and academics would better understand what challenges the clubs are facing, while the clubs would more easily find appropriate solutions to those challenges.
I'm also hoping that the evaluation and validation of analytics methods and products will get more attention. I have the impression that many football organisations (e.g., football clubs, football associations, agencies) are clueless as to what methods or products to use for, for instance, performance analysis, opposition analysis or recruitment analysis. What metric or product is most appropriate or useful to solve a particular task in a given situation? When would you use metric X, product Y or device Z? We could, for example, design a number of benchmark problems that are inspired by real-world use cases inside football organisations. New methods and products would be tested against those benchmark problems to inform the practitioners.
Sam Gregory has been working in the football analytics industry for more than five years and is currently the Director of Analytics at Inter Miami and a graduate student at Victoria University studying the intersections of sports science and analytics.
Over the past few years I think the areas I've personally learned the most in have been in some of the less-sexy parts of data science: scalability, data storage, engineering and data pipelines. As data becomes more and more accepted in football the non-technical people in the sport need less hand-holding and want to explore data themselves - this means instead of having to sit down with someone and deliver a custom report every time people want tools, dashboards, webapps and automated reports that they can interrogate themselves. This requires lots of backend work to make sure that users can access the output of your work without constantly asking you for data or visualizations. This is especially true at a club where you will have "users" across various departments in the club (recruitment, performance analysis, sport science) all of whom have different questions. If I were to recommend one area for incoming analytics hires in a club to upskill in it would be database and data pipeline management.
Looking forwards, as I've been saying for the last few years I think collecting data from TV broadcasts (i.e. broadcast tracking) is really the next big thing in sports analytics. There are now multiple companies doing this work so I hope in the next few years I - along with the rest of the analytics community - will have figured out better ways to get the most out of this data and all of the additional challenges it presents (occluded players, predicting off-screen loads, massive data storage etc.).
Mark: I will butt in and take a victory lap here for saying in my start-of-year piece: "2022 will be the year that [an emphasis on data engineering] starts to get really drilled into the public analytics consciousness."
Also, as someone with an analytics newsletter, I too am very much in support of Jan's wish for a better-integrated analytics community.
Arielle is a Data Scientist at Zelus Analytics and occasionally contributes NWSL pieces/visualization for American Soccer Analysis
It’s probably a cop out to say that I’ve learned everything in the past couple years since I’m relatively new to this world. Since I started learning about football analytics, though, I think my biggest consistent lesson is that communication skills are perhaps more important than any technical ability you might have. At the end of the day, you need to be able to translate a model or analysis into terms or concepts that folks on the other side of your screen (or across the room) can easily understand.
Sometimes, that might mean opting for a less predictive model or a less intricate visualization — barcharts and scatterplots are great for a reason, you don’t need to reinvent the wheel— but it doesn’t mean that you’re less talented, it just means you’re doing your job well. In a lot of ways that’s not too different than analytics in other fields, but I think that it might be even more important in football analytics since the concepts are so new to a lot of people who are looking at your work.
In the next couple years: From afar, it feels like analytics in the women’s game has recently advanced quite a bit, but lags behind what’s available on the men’s side. I don’t think everything we know from public analytics can necessarily translate directly to women’s football right now. For example, I want to understand the biases that might arise when using models based upon what we know about the men’s game. Do we need to adjust models? And how? Do we need to contextualize the numbers we see differently? I don’t entirely know the answer, but I’m looking forward to learning as the field learns, too.
Joris Bekkers is a Sports Data Analytics Research and Engineering Consultant who has been working with the U.S. Soccer Federation since 2018.
One thing I’ve learned over the past few years is that, especially in smaller teams, you can’t just be a data scientist, a data engineer, or a data analyst. You need to learn to be all these things. You need to learn to be a well-rounded developer that also has knowledge of the football-side, a strong grasp on at least one programming language, know how to maintain databases, create well balanced, visually interesting and information dense data visualizations, keep up with the latest in football analytics research, and learn how to combine all of this to build automated data pipelines.
And for me, the most difficult part of all of this is to learn to take some time out of the week to develop or learn new skills, because it’s easy to get carried away by all the work you have to do, but you need to take a step back sometimes and think about ways to be smarter about the work you do.
In a few years from now I hope to be able to say that I’ve successfully integrated Sport Science and Analytics, such that strategic (on-field) decisions and physical workload are almost inseparable, and tactical tradeoffs are made between energy expenditure and on, and off-ball gains.
Mark: I've grouped these two responses together because of the things that Arielle and Joris mention as wanting to learn in the coming years, because they match very closely to mine.
'Is a women's football xG model different to a men's football xG model?' and 'why are analytics and sports science separate spheres?' are two Get Goalside issues I've wanted to do for a while but haven't been able to do (yet).
(On the former, there's an interesting post from Lotte Bransen and Jesse Davis of KU Leuven here)
Also, Arielle's line about sometimes opting to use less intricate methods to communicate better - "but it doesn’t mean that you’re less talented, it just means you’re doing your job well" - is a great one.
Professor of applied mathematics in Uppsala; author of Soccermatics and other books; and co-founder of Twelve football
I was most struck by a reminder by Jon Mackenzie that I said five years ago that clubs needed to invest in understanding the basics of football using tracking data. And really that still hasn't happened. There is some progress at Liverpool and Barcelona, and they are on their way at Manchester City and Leipzig. But there still isn't the ground work done that is needed. That doesn't mean there aren't good data scientists at clubs. There definitely are. We talked to many of them on Friends of Tracking. It is just that when, for example, I studied fish or pigeons or even locusts, we were a team of 5 or 6 data scientists and experimentalists (coaches, in footballing terms) working together. I still haven't seen that happen.
In the meantime, in my own research we are making some steps in that direction ... often working together with those data scientists at clubs. But my work is done at a more leisurely pace just now. But if I was in charge of a big footballing club I would say, lets take some of our budget and let's set up a project to truly understand the movements in football using data and then use it as a co-ordinated way to do scouting, tactical analysis, everything... The potential edge for this first team to do this would be substantial.
Director of Analytics – Toronto FC
The most unexpected learning was just how difficult and time-consuming it would be to extract value from raw tracking data. Before full player tracking became ubiquitous across analytics departments, there was a naive optimism that it would serve as some magical panacea for unlocking the intricacies of the game invisible at the event-data scope. But instead, we’re still left waiting for decent defensive metrics.
The reasons for this are not entirely clear. What is for certain is that friction arises between the sophisticated techniques required to work with tracking data and the language found in the sporting theatre. Pitch control and its various flavours come the closest, but it remains non-trivial to implement.
Instead, teams have mostly resorted to utilizing tracking frames to enrich individual event datum while retreating into the safe statistical corner of rates and frequencies. This is a genuine improvement over where the industry stood, but things obviously haven’t gone exactly how we promised.
Mark: Last but certainly not least, two quite focused responses on working with tracking data.
The themes of a lot of other peoples' responses mingle in with this too. The reason why several people touched on data engineering, for example, is because departments are quite small, which David would like to see change.
I also think there's a link between where David says "let's set up a project to truly understand the movements in football using data" and Devin says "there was a naive optimism that [tracking data] would serve as some magical panacea for unlocking the intricacies of the game invisible at the event-data scope. But instead, we’re still left waiting for decent defensive metrics."
Maybe one reason why the impact of tracking data has been limited is because so few clubs have invested in research departments/projects as David describes them. (And, as Jan says, clubs tend to be secretive)
Thanks again to each of the contributors.
For my part, I think the last few years have taught me a lot about the day-to-day of football, and the extent to which good and sensible tech can speed things up (influenced somewhat by the fact this is something that Twenty3, my day job, excels at).
That's not to say that tech tools inherently matter more than analysis or exploration, but it's more about what is delivered. Analysts at clubs might have been using a blunt knife to chop the vegetables before; with a tech tool you can bring them a decent knife from the shop; and that's probably more use to them than the hand-sharpened artisan one that takes four months to be crafted and is probably too expensive.
(There's also probably some recency bias here, I'm sure there are things I've learned and forgotten that I once needed to learn it).
Looking ahead to the next 100 newsletters, the thing I most want to learn is how to take insights from things like pitch control models and make them actionable. What specific things do you draw out? Can you use it to create tactical plans? If so, how much are you reliant on coaching quality to implement them? Could the analysis of the data even help improve the coaches' coaching?
Can we get those artisan knives to the analysts?
Lastly, there are some people who deserve a bit of thanks, without whom Get Goalside wouldn't have got this far, or probably existed at all. They fall broadly into three groups.
The first are those who've encouraged, cajoled, or nudged me at some point of my time in analytics: David Perdomo Meza, Bobby Gardiner, @TiotalFootball, Mladen Sormaz. The second are people whose work has inspired me in various ways: Mohamed Mohamed, Tom Worville, Joe Mulsberry, Vosse de Boode, Karun Singh, Grace Robertson, John Muller. James Yorke and Thom Lawrence fit firmly into both camps. Apologies to others who I've missed.
And the final group is you, dear reader. Particularly if you're an email subscriber. Thank you very much; for reading this, and for reading however much of Get Goalside that you've read in the past. I hope you enjoy what's to come in the future.