Where is analytics? What is analytics? An autumn 2022 update.

I hadn't been planning on writing a post-StatsBomb conference newsletter. Blame the train ride home.

Part of the reason for not planning to was not knowing how to write about a conference where you can see most, but not all, of the content. So this isn't going to be a blow-by-blow review, more a combination of thoughts, many of which were percolating anyway, that the day helps to tie together.

The other reason for not planning a post-conference newsletter was a desire not to be seen as being swayed into positive coverage by a free pen and free StatsBomb-branded socks. (A high-quality pen as freebies go; cannot yet report on the quality of the socks).

Despite this second reason, allow me to suggest some taglines for the company's marketing team to use in their post-conference content:

  • 'StatsBomb Conference 2022: A wide array of expertise!'
  • 'StatsBomb Conference 2022: "We don't have to validate our existence anymore"'
  • 'StatsBomb Conference 2022: All analytics is web apps!'
  • 'StatsBomb Conference 2022: All grown up'

These may need some refining.

It shouldn't escape attention that a number of the speakers on the main stage during the day have worked in football virtually, if not genuinely actually, since leaving university. Within the last decade. And these weren't 'how to get your foot on the ladder' type talks, they were 'how to take charge of a department' type talks. StatsBomb itself has 'grown up' from a somewhat sparsely-attended launch as a data company in 2018 to hosting a really-very-large stage within the more attractive bowels of Wembley Stadium. No wonder that the day featured a panel titled 'The End of the Beginning? Where Does Analytics Go From Here?'.

When, during it, Javier Fernández - formerly of Barcelona, currently of Zelus Analytics - said that analytics is in a bit of a teenage moment (sometimes looking very grown-up, other times still a little lacking in maturity), it worked as a cusp-of-adulthood metaphor for much of the room. So you set up a data company/earned a very respectable job/got your hands on tracking data: what next?

Mostly software development, apparently.

The progression of football analytics goes a little like this:

  • Step 1: Find something useful to say
  • Step 2: Make sure important people are willing to hear it
  • Step 3: Enable the important people to hear the useful things without you having to do the same amount of work every time they ask

An increasing number of people in the industry have now arrived at step 3 and require the kind of infrastructure that essentially turns them into mini-tech companies (A new Warholism for you: "In the future, everyone will be a tech company for 15 minutes"). They need database engineering and maintenance. Automated tasks. User-friendly applications.

It's not like this is unique in the sporting world to football. In The MVP Machine - Ben Lindburgh and Travis Sawchik's 2019 book on baseball's drive towards data-driven player development, currently in Get Goalside's 'In Progress' pile - Hall-of-Famer Pedro Martínez gets a mention not for his on-field exploits, but in the form of PEDRO, an in-house analytics application at the Boston Red Sox.

Heck, it's not even hugely new to football. People like Joe Mulberry and Karun Singh were cooking up football analytics apps for the Stats Perform Pro Forum in 2019 and 2020 respectively. But it being the mood music is new. (Also, always keep an eye on what the people who were ahead of the curve several years ago are up to in the present).

If you don't want to call this a turn towards software (as attractive as that may be as a VC pitch), you could sum it up differently: as a drive to consolidate knowledge within football organisations.

In some cases that's some kind of internal app (as mentioned by a number of main stage speakers), with which people could look up stats, visualisations, reports all by themselves. In some places it's developing proper analytics teams, whether by expanding (or creating) through direct hiring or by bringing people already on the payroll together in a more aligned fashion. And in some places it's by going full-centralisation, through developing multi-club models or analytics consultancies where you can get some 'economies of scale' by being able to share knowledge/development-time across numerous clubs.

If you want an attempt at a pithy one-liner, where once analytics was a space of adventurers, now it's a space of entrepreneurs. Insight is one thing, productising insight is another.

But all is not lost for those souls who merely, like Star Trek, want to venture out where no-one has gone before. There will always be more new knowledge to find. There's always an elusive golden apple just out of reach, spaces on the data periodic table still to fill.

Some of this will be filled by the existing data, but the other inevitability is that there's always some new kind of dataset to yearn for. In the olden days - as recently charted by the Post Script podcast* - analytics bloggers considered a proper trove of event data as the thing to covet, the place where new answers were to be found. Then there was tracking data, the gift that would lead us to finally investigate and understand space; the final frontier.

Now there is some movement again, though more subtly.

The new element now is the broad field of computer vision (or as it's probably easiest explained, 'detecting shapes from images'). This is partly because advances in the field mean you don't necessarily need in-stadium multi-camera set-ups to produce good-quality tracking data. There are companies whose whole shebang is making it out of the kind of TV broadcast footage that anyone, in one way or another, can get their hands on.

But computer vision is partly interesting in the same old way that any wave of technology is interesting: if machines and bots can do what humans can do, it'll usually (eventually) be quicker and cheaper to get the machines and bots to do it.

Open-source computer vision packages are getting easier to use and better at what they do, to a level where even a fairly average joe could take a crack at creating their own computer vision-derived football data nowadays. And, y'know, why buy tracking data if you could create your own?

For that matter, why buy event data when you could create your own. We're not there yet (way off) but could a combination of tracking data and body posture detection one day do away with human-collected football event data for good? What's a 'tackle' if not two players in the vicinity of the ball, one of whom is making certain movements with their legs?

For the record, Sarah Rudd - speaking on the previously mentioned 'Where Does Analytics Go From Here' panel - doesn't think you, working inside clubs, should pursue this. And, sure, I take the point of the former StatDNA/Arsenal employee and general analytics pioneer, that trying to replicate the fine-tuning and reliability of entire companies whose sole purpose is this one thing might not be the smartest idea. It's probably a little silly. But then, this newsletter is a little silly.

The general point is that it would seem weird if the increasing accessibility of computer vision programmes didn't end up affecting relationships between data providers and clubs (or other data purchasers) in some way. Even if that way ends up being, as kind of happened with event data, forcing existing companies to adapt and evolve their offering.

That said, that comparison isn't perfect: as Rudd said in her solo talk at the conference, event data is more 'generating' than 'collecting' - "you don't pick it up off the ground" - whereas pure tracking data is much more like collecting, in that players' movements in space are as close to an objective fact as you can get.

The comparison isn't 1:1 in another way too. If you were dissatisfied with event data, as an event data customer, it would be fairly easy to collect new and different things to a good degree of reliability (as the success of Sportscode and similar software can attest) - the problem is in scaling it. With computer vision-produced data it's the reverse: pretty easy to scale, much harder to fine-tune to a reliability where a slightly unusual camera angle won't silently ruin the whole system.

But, like, what if you took the cheapest event data you could find, which was generally ok but a little inaccurate in its event locations or something, and you matched it to video and developed a computer vision programme to improve the timestamps and locations of that basic shot data? Maybe you add in some fancy features that aren't in the cheap event data too; now that the hard part of deciding 'this is a shot taking place' has been done, how hard would it be to add in things like shot height, shot technique... Could you DIY yourself some top-of-the-range shot data from a lower-grade input plus video plus basic computer vision?

(Even if you could though, would all the development time, and opportunity cost of not doing other things, be worth the reduced cost of the event data? That's one for the accountants and sales teams to fight it out over)

This segues neatly into an amendment I need to make to the pithy one-line attempt from earlier. If the balance of football analytics is shifting from explorers to entrepreneurs, then there's still one exploration-themed job that's still very much needed: cartographers. Data cartographers. By which I mean, skilled modellers who can take the cries of "there's something over there!" and work out its exact contours, sea depths, and mineral composition.

Nowhere was this more evident at the StatsBomb conference than the research stage. Investigating how existing model ideas applied to new datasets was a big, big theme, whether that new data be women's football leagues or StatsBomb 360. (The research paper competition winners and links to the papers are listed here).

This kind of work is particularly important, and the dynamic with the rest of analytics going forwards particularly intriguing, because "slow down and run thorough tests across different model types and parameters" gets to coexist with "move fast and break things" (and/or "move fast and market things"). Also, because companies keep producing new types of datasets nowadays.

It may not be glamorous work, but it is essential. Partly because without good cartographers you risk causing a ship wreck or wandering into a desert. And partly because this work being thoroughly done and the findings being written up makes it less likely that the knowledge will vanish from memory (or, at least, that it's more likely it'll be unearthed again if it does). We all need cartographers and we don't give them enough credit.

Anyway, that's a thread of thoughts that over the span of a few months would probably have formed the basis of two or three more well-put-together newsletters. It's a rough stock take of where I think things are at the moment, probably not right at the edge of analytics but sorta two steps back from the edge. The kids are growing up, putting on business suits, and going out into, and shaping, the big wide world.

Speaking of going into the big wide world, for those on the outside of the industry seeking a way in it might seem demoralising that there are fewer low-hanging branches with which you can haul yourself up on. While there are a lot more resources to learn from, it's unlikely you'll be able to sweep to fame and/or a career on the back of a rusty, homemade xG model like you could a number of years ago. It sometimes seems like the skill or knowledge levels required now for meaningful and/or meaningfully-paid positions are very high (which is unsurprising, it's a competitive industry).

But... But but but. Let's flip the skill-requirement idea on its head. If you really want to learn a lot about data science then there's a ton of work based on your favourite sport to sharpen your expertise on now. If you're not maths or coding technical but are interested in communication, journalism, visualisation then you can take some of the nerd-stuff and work on making it accessible and popular. If you're interested in business or product creation, there's now a whole lot of people whose skills and ideas are there but might need a little refining or packaging to take things beyond just 'ideas'.

In the very early days, (public) football analytics was mostly a thinking exercise, an imagination game with not much data to go around. Then there was a code-y mathsy stage that merged, with the increasing openness of tracking data, to a physics-y mathsy stage. At some point, maybe the physics-y people really will 'solve football', or come as close as feels possible, at which point analytics may well become a pure 'product and pounds (£)' business.

But at the moment, it's a little bit of all of those things. It's everything.

Thanks for reading

*Prompted into nostalgia by the Post Script podcast, I've re-opened the least un-worthwhile parts of my old, old blog as a kind of historical document