A Pandora's Box of football media stats use
A blog about football stats in the media
It's 2009 and you've just stepped out of the cinema when a version of yourself from 2023 appears in front of you. You exchange exclamations, pleasantries, a copy of Grays Sports Almanac, and then get onto the movie you've just seen.
The 2023-you reveals that, just like the original, Avatar 2 has smashed box office records. "Wow," 2009-you says, "and do you still have to wear those glasses for the 3D?" "Oh, no," says the 2023-you. Your 2009 mind is blown: "Wow! Technology must have gotten really good then!". "Uh..." 2023-you errs, "no, no-one really makes 3D movies anymore."
It would be very exciting to go back to summer 2014 and tell the readers of Colin Trainor's StatsBomb post that, almost a decade later, that metric, as well as expected goals, would be getting mainstream usage. 'The standard of media and punditry must be really high', they might think. And yet...
A recent post by Casey Evans for Analytics FC ended with the following paragraph:
The data is there for the mainstream media to pull information from and there is also a massive pool of writers and analysts willing to interpret it for them. The question now is whether football media will take the next step needed to bring their coverage up to date.
The point Evans makes is one that, on the whole, I'd agree with - that despite some increase in data availability and a big increase in data consciousness, it doesn't seem to be well-used by the media industry. (There are exceptions of course - I think Sky Sports, with their online output and Monday Night Football in particular, have done pretty good work - as Evans also points out, but on the whole things are slightly more 'as they were' than you might have expected).
I think that there are three main reasons for this, which each have their own 'solutions'.
- Stats are new
- Football is hard
- [insert 'el problema es el capitalismo' meme]
All three are kind of interlinked. Stats being new means that people in media haven't had time to 'become fluent' with them, but a different reason why people in media haven't had that time is because of what jobs look like (produce as much as possible to house ads) and what editors look like (they don't).
But football being a difficult, pesky, dynamic sport means that it's hard to pinpoint statistics that really cut-through the noise. Expected goals does a great job at what it does; everything else is messier than a Double Pivot Salacious Gossip podcast. Evolution will take its course and better metrics will rise, but, like we said, all these football stats are still pretty new. And with the money in football analytics pointing to proprietary information (mostly inside clubs), media mostly gets the leftovers in the fridge.
(Apart from media companies who purchase the services of Twenty3 and their Toolbox of goodies, the finest employer of the writer of this newsletter that there ever was. Although seriously. They also do pro-side focused work too. Get in touch with them.)
Darwinian forces are starting to have more of an impact though. Expected goals timelines and momentum charts are beginning to become pretty commonplace, neat visual ways of capturing quite a lot of information.
Talking strictly stats, here's a list of things I'd consider throwing into a basic toolkit:
- High turnovers
- Counterpressures (if available)
- 10+ pass sequences (spells of possession with 10 or more passes; ten is a bit of an arbitrary marker, but it makes you go 'huh!')
- Fast-break final third entries (this is one I've made up on the spot but I conceptually prefer it to 'direct speed' that is used in some places)
- Crosses and cut-backs (both the volume and the relation between them would be interesting)
- The passes -> final third passes -> shots chain
I think that those ten or so stats would cover most of what you want to look at on a team level. They cover a range of tactical approaches and are a good combination of effective at explaining something and quite easy to picture.
Here are some things I'd put in a booster pack:
- % of goal kicks short and % of opposition goal kicks short (the former is simple, the latter can be a quick implied judgement on a team's high press when you look at the average for the season)
- % of possession sequences starting in the defensive third that reach 5+ passes AND/OR % of possession sequences starting in their own half that reach 5+ passes without attempting to enter the opposition half (I've only used the first of these, but think they're both intriguing looks at how a team approaches build-up in a fairly simple way)
- Post-shot xG
- Expected pass completion and how it compares between two teams in different parts of the pitch/different situations (I think this is most interesting for players though)
There are some stats that haven't made either list not through oversight (although some might be oversight) but because I, personally, find stats easier to work with if I can see them. I know what a spell of possession with ten or more passes looks like; I know what a fast-break into the final third looks like; I know what a high turnover looks like. Like band and brand names, almost anything can be a household name with enough push, but being tangible certainly helps.
Players would have different toolkits to teams. Part of this is because some of these stats (e.g. PPDA, sequence-based stats) are literally, in the way they're constructed, team statistics. But final third passes is a great example of a different reason: it's pretty useful to know how much a team spends in the attacking third, particularly compared to their opponent; it's not that useful to know how much time a player spends there.
Inevitably, we arrive now at "what is football" theory, though I will keep it mercifully brief.
I assume that part of the difference in 3D execution a decade-and-a-half ago came from experience and time, but part from just better ideas about how and why it should be used. The same is true with football data - knowing what you want to measure helps you get to better metrics.
If we want football media to make better use of data, that's where it needs to start.