They say that imitation is the sincerest form of flattery.
Take the phone market: once upon a time you could get brick phones and flip phones and slide-y phones and Blackberrys. And now, almost certainly thanks to the success of the iPhone, every mobile on the market is a five-to-eight-inch slab of metal and glass, with three cameras and zero headphone jacks.
They say that imitation is the sincerest form of flattery.
On 9 May 2018, the former analytics blog StatsBomb went from consultancy to data provider, giving out USB sticks of their data at a launch event in London (which I promptly lost). There were two nifty features of their new data provision that was worth making a fuss about.
The first was 'freeze frame' information for shots. Any time a player took a pop at goal, StatsBomb would provide the location of where every other person on the pitch was (providing they were on camera). No longer would xG models be 'naive' to the position of defenders. Now they would know. They would see the truth.
The same day -- in fact, a little before the StatsBomb event was due to start -- Opta (who'd later merge with STATS to form Stats Perform) dropped a tweet.
The next day, Opta posted a blog explaining two new qualifiers (tags which they associate with events in their data to provide more information): shot clarity and shot pressure. Two data points, it was noted by observers at the time, which would be possible to get from freeze frames.
Now, these developments of the two datasets by the two companies may have happened purely independently of each other (setting up, or adapting, a data collection service isn't an easy thing to do). And hey, they may even have coincidentally chosen the same date to make their announcements (it was the final week of the Premier League season). But, well, you know what they say...
The second most sincere form of flattery is signing up to newsletters
I thank you for sticking with me through those previous 290 words of analytics history. It's relevant, though, because of the second big feature of StatsBomb's dataset that was launched that day in 2018.
It was 'pressure events', and it would massively increase the number of actions that could be associated with defensive players. At the time, the big player in the football data market was Opta, and their primary defensive actions were tackles and interceptions. The most high-volume players might average six or seven per game. With pressures, the highest-volume players might average four times that. That's useful data.
With the bombast of a start-up, StatsBomb talked about how their data was going to change the game. And the bombast was deserved. Because it did.
Because, nowadays, everyone seems to have pressure data.
Wyscout -- primarily known for their video services but also a data provider in their own right -- have had 'pressing duels' in their data set since 2019.
More recently, the STATS-Opta merger offspring Stats Perform are in on the act. Stats Perform's shop-window content wing, The Analyst, is producing a lot of interesting stuff that uses pressure data.
Newcastle have the highest proportion of pressures in their defensive third & lowest proportion in the attacking third of any team in the Premier League so far in 2021-22.— The Analyst (@OptaAnalyst) November 20, 2021
Will Eddie Howe make them a hard-working side in the opposition half of the pitch? ⬇️ #NUFC
I'm not clear on which of Opta's various data feeds this is on (their main events feed is 'F24', their expected goals in the 'F70'-something range), and it seems probable that it's something that gets added to the mix from the STATS side of this collection marriage. [I wasn't able to get a clarification of the data feed in time for publication but will update the online version of this when I get it]
And that's not all. Sportlogiq -- a Canadian company specialising in tracking data rather than manually-collected event data -- collect information on the amount of pressure players are under, and have done so since (at least) 2018. (I spoke to them, and STATS, and StatsBomb, among others for an article in that year)
It seems likely that, at some point, pressure data will just be standard in any dataset you might be able to purchase. Some form of shot clarity (how clear a view you have of goal) might be too. And if that comes to be the case, what will it be that differentiates the data providers?
My thoughts on this, after the following box where you can sign-up to the newsletter if you haven't done already
Cost, speed (particularly for media), and accuracy are potentially the three big differentiators. But these are all quite boring. It's like saying that a key decision-factor in buying a new phone is build-quality: true but, y'know, at a certain level the phones are all functional enough to serve your purpose just fine.
So then (speaking about phones still, you understand) you get to things that you quite frankly don't use or realistically don't care about. Number of cameras. 4K shooting-quality. Shape of the bezel.
One thing that does set some phones, or phone brands, apart is the software they come with (mainly Apple). But similarly, StatsBomb have their IQ system; Stats Perform have the Trumedia-produced ProVision; Wyscout have data built into their video platform. Perhaps these systems can be unique and valuable enough to lock customers into the provider's ecosystem.
Alternatively, perhaps other 'add-ons' could have the same effect. In recent years, the number of tracking data companies (such as Sportlogiq) has leaped. Most of these (to my knowledge) don't offer a 'complete' set of events (such as passes, tackles, etc), but often do offer the ability to match their data with a full event provider. Maybe your choice between these event data providers will be decided by which tracking data companies they work 'out of the box' with. Or which data provider works with other types of 'add-ons' (how easy it is to link with video? third-party software? VR experiences? database set-up advice? consultancy services?).
The interesting thing about this question is that customers will choose data providers who best serve their needs. But, at the moment at least, everyone has such different types of process that those needs can differ quite greatly, including what type of data they want. Unless a customer is really big or really important though, providers aren't going to change what they collect to suit one client. Otherwise they'd never stop. Their provision will likely settle on something that is generally good for most people, but not perfect for anyone.
And if data providers are all trying to service the same people, will all of their provisions end up looking the same? A row of five-to-eight inch slabs of metal and glass staring back at you, separated by the number of megapixels in the three-camera arrays. And the shape of the bezel.
This is the end of the main newsletter, but subscribers in the data sector might want to continue reading
There's a point in this newsletter where I wrote '(to my knowledge)' in reference to what tracking data companies offer. Most of my awareness of data provider provision comes from several years immersed in and around analytics Twitter, and working for Twenty3, whose software product is built to work with varying different data providers.
However, I'm aware that there are gaps in my knowledge. I would like to fill them; for my benefit, for your benefit, and, to be honest, so that I can be fair to companies who don't have the Twitter clout that some others might have.
I'm aware that this newsletter has a degree of significant readership, and while it's still very much a-thing-I-do-in-my-free-time I want to make as much effort as I can to be thorough.
So, over the coming days I'll be getting in contact with the data providers I'm aware of to ask for access to documentation and be put on any list for product updates. I don't plan to write specifically about updates or what's in the data, but, like with this newsletter, it might be useful to fold the information into a wider piece. The providers are welcome to say no, of course, and I won't hold it against them if they do.
I'd also be open to other companies who I may not be aware of getting in touch with me. If that's you, you can use the email address firstname.lastname@example.org
Finally, and only very tangentially related in the spirit of a slight increase in professionalism, while this is still very much a-thing-I-do-in-my-free-time, I do have a ko-fi page where you can (literally) pay some appreciation or encouragement if you wish.