2020 has been the moment in time that best embodies the use of capitalisation to signify heavy intonation. It’s been, very much, A Year.
It’s been a bit of A Year in the football analytics space, although for different reasons. Expected goals has succeeded in setting up shop in the mainstream, creating room for things like PPDA and Mikel Arteta’s win percentages. The first lockdown of the pandemic sparked Friends of Tracking, which — between its YouTube channel and Github code repository — is probably the single best one-stop shop for analytics learning out there.
Jan Van Haaren has put together an incredible list reviewing 2020 in football analytics (all the more incredible for this newsletter being included), so I won’t cover the same ground.
The amount of soccer analytics content has spiked in 2020. My latest blog post lists the research papers, blog posts, news articles, events, invited talks, webinars, podcasts, Python libraries and newsletters that I liked the most!https://t.co/gHdP2znJL8— Jan Van Haaren (@JanVanHaaren) December 30, 2020
However, there are a few things I want to touch on, before going on to think about what 2021 might have in store.
For the past several years, it’s been a semi-recurrent grouch of ‘my wave’ (c. 2014) of football analytics people that ‘there isn’t much public work being done anymore’. 2020 has blasted that complaint out of the water.
Whether through the increasing number of free datasets and learning resources, inspiration being taken from high-profile successes, people gravitating together after being online in the space for a bit, or just sheer numbers, there’s a lot of people doing neat stuff. It doesn’t need to be groundbreaking, and a lot of the useful stuff doesn’t need to be.
Something as simple as Tom Worville’s point about ‘true tackle success rate’ in Opta data (see here for a reference) is a great example. Opta have specific info about which fouls are the result of failed attempted tackles. Making sure you’re using the complete set of data available to you is such a simple thing, but isn’t necessarily obvious.
Speaking of Worville, him joining The Athletic towards the start of the year was a pretty big sign of where things are going. Reach PLC (think the Mirror, Liverpool Echo, Football[dot]London etc) had hired a couple of analytics writers previously, but this felt slightly different.
Part of why it felt different was just because it was The Second, i.e. not the first. This was no longer an outlier event, it was the makings of a trend. Also significant was that The Athletic, with their Hollywood names poached from the establishment of the UK football media, suddenly became Worville-central. Outta the way Ornstein, heave-ho Honigstein and Horncastle, we want Worville. We want his stats. (Disclosure: I’ve known Tom for several years and am personally happy about his success; that aside, those last two sentences remain unimpeachable media analysis)
While the media isn’t the professional game, it’s probably a good bellwether, particularly as the two sides have a degree of overlap. At data companies, the people helping out with enquiries for articles or TV shows like Monday Night Football will also likely work on projects for pro clubs. MNF, and Sky Sports more generally (among other outlets), are increasingly using more and more complex data to help discover and illustrate points.
And then, to take a sharp turn, there’s this other thing: the big potential lawsuit. ‘Project Red Card’, announced in the summer, brought together hundreds of former and current players to seemingly test the waters on what kind of consent needs to be sought from footballers to gather, use, or sell data about them. It seems like a test case that’s in discussion and negotiation stages at the moment, and there’s a decent video explainer here about it.
I haven’t heard much about it since the initial reporting though. This August article in Wired says that PRC was yet to name any specific data companies. Unrelated to the ‘Project Red Card’ itself, in November, Russell Slade, figurehead of the group, tweeted in reference to Zlatan Ibrahimović’s complaints about his image being used in FIFA games.
As it turns out, Ibrahimović’s (and others’) comments were probably more to do with a battle between their agents and FIFPro than sincere concerns about data rights (although these players may well still have those). It’s a thorny area, and this seems like a good time to turn to 2021…
If the mini fuss caused by Ibrahimović’s tweet is anything to go by, then this issue of data rights/privacy could get very messy if (when?) certain interests decide to get involved. What would happen if Mino Raiola pointed all of his clients in Slade and Project Red Card’s direction? What happens if Ibra starts tweeting about how gambling companies are making money off him winning a corner, or how broadcasters might be showing his distance or speed stats in coverage?
In the same kind of area, I responded to this article in the New York Times about data’s role in the future of football with one of my own. I questioned — in a daily newsletter I was doing elsewhere at the time — whether football clubs had the right processes in place to be collecting and storing data on things like player well-being. To quote myself:
[I]f clubs are collecting increasing amounts of personal information, how sure can players be that their data practices are secure? Are football clubs, which relatively regularly make clearly bad manager appointments or transfers, definitely going to be storing this data properly?
[…] If the thing coming over the horizon is an intensified monitoring of individuals by their employers, we should really think about what that means before it arrives.
However, I’m not outright predicting that this particular player-data issue will be a thing for 2021. Like Project Red Card’s question about consent over data collection and sales, this kind of thing will be an issue bubbling away out of sight until someone eventually decides to open the door to it. That door might open in 2021, but I imagine it would probably take a specific incident to prompt it.
Meanwhile, data companies are just, uh, collecting even more data….
Over at Stats Perform (née Opta), they are/will be adding player ‘controls’ to the data, meaning (I believe) that we will finally know all of the times that they touch the ball.
StatsBomb have dropped their own teases, including the below image from an internal hackathon, which looks a little like some kind of passing option snapshot with cover shadow (or something). Or, as Ted Knutson says in the article, an illegal soccer rave.
Wyscout, I believe, are also doing their own improvements to their data, and other data providers are both existent and available, I am just less aware of their plans for 2021.
Outside of the provision of data, there’s the application. Data providers tend to also offer their own software or services around them. The most interesting space on this front is probably Second Spectrum, a tracking data company with official deals with the NBA, MLS, and the Premier League. They’ve done bits and pieces of augmented reality stuff with the NBA for a while now, but in November had their first ‘enhanced broadcast’ in football/soccer during the MLS play-offs. Note, in the attached tweet, the FIFA/PES-esque player map and hovering names. (Similar has also been offered for certain customers on BT Sport for Premier League matches too).
I haven’t had the chance to experience this for myself yet, and apparently there are kinks to work out, but I can see how both features pictured here can be immensely useful. Player names would be particularly useful for more casual fans who don’t know everyone (I’m particularly thinking of the possibility of getting people more easily acquainted with players in the women’s game), and it would also offer commentators the option of different styles.
I would certainly like to see a ton more of this in 2021, but it’s kind of dependent on the broadcasters. Anyone who’s watched some La Liga TV coverage will know that their match broadcasts are pretty experimental compared to a lot of others, with things like heatmaps being overlaid on the pitch. The Bundesliga’s broadcast feed has used tracking data for average positions, but not been so avant-garde (to my knowledge) as to splash the information across the entire screen mid-match.
I’d love for this kind of innovation to be more widespread, but it probably won’t be front and centre. So far, the Second Spectrum stuff has mainly been on certain settings within apps rather than the regular TV coverage that the majority watches.
There’s also the company I work for, Twenty3 Sport. To avoid sounding totally like an ad, I’ll use us as an example of wider trends in the industry. While I’ll try and avoid saying totally like an ad, I will say that our Toolbox software product has many tools but, more pertinent to a discussion of the industry, works off multiple data providers.
That the demand for football data is large enough to feed a supply of an increasing number of data providers shows how healthy the sector has become. Organisations are no longer seeing data as something that they should probably have if they can afford it; I think they’re much more likely to be active in seeking data and applications that suit them. The larger number of providers offers that, with more choices to choose from based on various needs of speed, breadth, depth, and price.
Where Twenty3’s Toolbox can also be used as an opportunistic jumping off point for future-gazing, the appreciation and demand for better-designed and on-brand data visualisations will only grow. For good topics (Tom Worville at The Athletic) and bad topics (all coronavirus, but particularly John Burn-Murdoch’s work at the Financial Times), 2020 was a big data vis year. These visualisations being well-designed means they’re more effective at communicating their message and being shared, and being on-brand means that people know who’s doing the communicating. The ‘club side’ may not care as much about this (although clubs are increasingly media outlets of their own), but they definitely care about effective design.
This is all about data communication, something that we and The Athletic share but perhaps Mikel Arteta could use some pointers on. Worville confirmed in The Athletic that the win percentages that the Arsenal manager mentioned in press conferences were based on expected goals modelling. Ideally, this [waves hands generally at Arsenal] would all be better communicated.
There are signs that clubs are improving in this regard, even if their managers confuse things a bit. Job adverts — or, at least, the ones I’ve seen floating across my social media feeds — have improved markedly. Descriptions of data-related roles seem consistently focused and thought-through now in a way that they haven’t always been. (On this subject, I wrote about two City Football Group job ads back in October).
With not just more roles, but more roles that are paid attention to, there’ll also be more coverage of the individuals. And, through this, there may also be more instances of questionable descriptions of what these individuals, or departments, do. It’s probably worth mentioning that for 2021.
In general, then, there’ll just be more. More use of data, more people working with data, more understanding of data, more little innovations.
2020 was the year we firmly got past needing to convince people that data or analytics was useful. That stage, I think, had already long-passed, but there was still a residual feeling that there was something to prove.
In 2021, people will be more open to analytics which will mostly be a blessing, but only mostly. For the charlatans (of which I’m sure there are precious few), it could mean they need to switch up their act. For employees at clubs or player agencies, it may mean frustration at being asked to provide statistics to back up a crummy argument. At the same time, might even more be expected from data analysts because of the success of the research team for Liverpool men’s team?
Mostly, though, it should bring more opportunities; hopefully more collaboration (probably not very publicly, but in group chats or employment); more refinements and more ideas.
Thanks very much for reading, both this piece and any others you read during 2020.
That was a look at 2021 in analytics, but while writing it, a tweet floated across my timeline, with a line that struck a chord with me personally.
2020 has thrown up different things for different people, and, at the risk of getting more personal than I’d like to, this has been one of mine. While professional sport itself can be said to serve a community purpose (local or just purely social), analytics sometimes leaves me cold. The relentless optimisation that analytics is based around feels, in essence, to be about winning in a way that feels tangibly different to whatever essence is left in top-level sport itself.
With the Premier League playing every game on TV in different timeslots, empty stadiums are a thoroughly familiar sight, and while some have commented that it’s a symbol of the league’s potential to be a fan-less vacuum, I sometimes feel like it’s similar symbol of analytics. It would be extremely tenuous to argue that developing a good EPV model benefits the community of a football club, except maybe if being without an EPV would send the club into extinction. Which seems unlikely.
The analytics community is also generally middle-class, very male*, and, in the UK, at a guess disproportionately white. This isn’t said to chastise, it’s just…… I’ll circle around to the point.
While the pandemic was in its first peak, there was a phrase that went around in relation to postponements, I think, that google tells me was a line from Washington Nationals’ pitcher Sean Doolittle: ‘Sports are like the reward for a functioning society’.
In the same way, they’re often a reflection of our dysfunctional society. The pandemic, of course, but also the racial injustices that Shireen Ahmed writes about in the piece linked to above. The gender and gender identity inequalities within sport (related, how many of the analytics hires at clubs work across both men’s and women’s teams?). Disability access both at football grounds and within playing sport more generally.
Arrigo Sacchi said that football is the most important of the least important things in life, and sometimes I wonder where analytics fits among that. Regardless, there’s an increasing number of people getting involved in it, and football has a powerful place in a lot of cultures around the world. To paraphrase Ahmed, maybe analytics is only worth it if we use it for something good.
*On this subject,