What we talk about when we talk about 'analytics'

Every movement has important dates. Football analytics has a few: 2003, the year Moneyball was published; 2014, the year of the first Opta Pro Forum, an event which has a stacked array of former presenters; October 2015, publication of the ‘air-conditioned offices’ article in the Daily Mail.

If you’re unfamiliar with this particular part of analytics lore, the basics are worth knowing (and the web archive link is here). Brendan Rodgers had just been sacked by Liverpool, amid rumours of discontent with a ‘transfer committee’ at the club infiltrated by nerds. Two paragraphs in the middle of the piece encapsulate the whole:

The increasing influence of analysts, young men who have no experience of scouting or recruiting players, has meant the end of the road for good football men such as Mel Johnson. […]

Instead a new breed sits in air-conditioned offices, cutting up videos from matches all over the world and burying their heads in the stats. [Michael] Edwards, along with his vast team of analysts, constantly monitors the opposition, providing detail about playing positions, style, routines, set-pieces and other important matchday information.

This is what analytics was seen as at the time. On one hand, a cosy obsession with spreadsheets that was an opposition, or even a threat, to traditional ‘football men’ and their hard-earned expertise; but on another hand, not just a new hobby but a potential career.

The tradition-vs-nerds framing was always a suspect one, even if people did clash, and now it’s almost totally fallen by the wayside. So what do, or should, we talk about now when we talk about analytics?

Laptop analysts

Let’s briefly return to the 2015 article:

They [Edwards and his ‘vast team’ of analysts] profile players based on their last 10-20 appearances, gathering information and helping Rodgers build a presentation for his players before matches that was usually a maximum of 10 pages on each team. It is a useful, but far from infallible, tool.

For an article that was so derided, even these three paragraphs give more information about day-to-day work than you’d expect. And it’s not exactly outdated nine years on.

Opponents are monitored for playing styles and patterns (set-pieces still a key point of interest) and information drawn into reports for coaches. Teams’ reports will differ based on the ‘vastness’ of the team of analysts, their own style of play, their priorities in how to approach matches, how coaches digest information, and the tools and data available.

This is just one part of what we’ve always called ‘analytics’, with models like expected goals being another part. For those new to modelling, a 101 summary would be that statistical techniques* are used to find the value of various features, and the resulting ‘model’ is tested (and re-tested) on a new set of data to see how it stacks up.

*maths, basically; computationally-heavy maths.

There is a degree of human decision-making, although ‘subjective’ might be a slightly misleading term for it. A data scientist does choose what to put in a model, but they can (if they want) just throw everything at the wall and see what sticks, before discarding what doesn’t. The choice of modelling techniques, deciding how to approach the problem at hand, may also be important.

But in 2024, the reach and scale of data is even broader.

Subscribe to the newsletter for more from Get Goalside

More data, more data

Nine years on from the ‘air-conditioned offices’ article, it’s both technologically and culturally much easier to have data around. Inevitably, there are also more data companies around than ever, and more data technology companies around than ever.

Part of the reason for this is the cultural change, of course, and part is a broad technological advancement. Computer vision improvements have helped to collect data more easily, and methodological improvements have helped with both the collection and analysis (particularly with tracking data, on both counts). ‘Line goes up’ financial factors mean everyone sees an investment opportunity too: even if not data-specific, a timely example is the recent announcement of JP Morgan’s sports-specific investment team.

However, this preponderance means that “we’re kind of at a point in the evolution of the industry where people know enough to be dangerous.” Those are the words of Sarah Rudd - formerly of Arsenal, now co-founder and CTO of consultancy src ftbl - on a recent panel at the MIT Sloan Sports Analytics conference. “People can get their hands on data and information that’s outside what’s being curated within the club, so you have to go through that education process with a lot of people.”

In some cases, that might not even be outside sources of data. On a recent episode of the Winning with Data podcast, Parma’s chief performance and analytics officer Mathieu Lacome described a perhaps unexpected way that clubs can badly use data. “You [can] have a club that starts to buy everything, every single source of data, every single piece of software without really understanding what you can do with it. That end ups with overspending and very low usage of the technology to do something valuable for the club.”

This is the important distinction to make, that ‘analytics’ isn’t simply having data around. Think of it like the ‘shot on an iPhone’ adverts compared to what most peoples’ phone videos look like: just because everyone has a high-quality camera in their pockets doesn’t mean that everyone is ‘doing cinematography’.

The death of ‘analytics’

“This is a book about analytics.” opens The Midrange Theory (a book about basketball rather than football). “I hate analytics.”

The author, Seth Partnow (who was previously director of research at the Milwaukee Bucks), continues, “Not the discipline mind you, but the word. The word has become hopelessly poisoned, reduced, confused, and misapplied.” What analytics is is written about a few paragraphs later:

Analytics exist at the intersections of math, statistics, and computer science. However, those are merely the tools rather than the field itself[…]Rather, it is a mode of thought seeking to reduce the impact of the cognitive biases we all suffer from. In a world wrought with imperfect information and uncertain outcomes, it is about putting oneself in a position to be less wrong.

This is the thing. You can think analytically without using data, and you can use data without thinking analytically. ‘Analytics’ is where both come together, not with the aim of being perfectly correct, but to be more right, more often.

The old ‘reports and models’ understanding of ‘analytics’ as a term does fit neatly into this. Reports help coaches get up to speed quicker, with a consistent methodology, providing a useful context, perhaps challenge, to their own opinions. The models seek to represent the sport more accurately, more succinctly, or more efficiently than the human eye and brain alone can.

Why are we saying that things have moved further than just ‘reports and models’ then?

When describing the human involvement in data modelling earlier, interpreting the results was left out very deliberately. How you interpret a model’s results is clearly important to whether or not you’re ‘being analytical’, but an excellently trained and validated model could still be used badly wielded. Part of the battle is avoiding that.

So, is ‘how you package your reports and models’, ‘analytics’?

Join over a thousand others as a Get Goalside subscriber

‘Please buy my product’

This is where Get Goalside might be partly guilty of stretching the term ‘analytics’ to breaking point. Because if the tech wrapper around the numbers is ‘analytics’, then where do we end? “Not only is ‘understanding football’ an important skill for ‘analytics people’,” GG wrote in the previous newsletter, “increasingly so is ‘understanding business’ and ‘understanding management’.”

reader, sobbing: You can’t just point at everything and call it ‘analytics’!
Get Goalside, pointing at employee-empowering policies like flexible working, high wages, and professional development budgets: Analytics.

But seriously. To take from the Midrange Theory quote, anything directly connected to maths or statistics is probably ‘analytics’. Anything else is probably in a category of ‘analytics implementation’. You can’t really ‘do analytics’: you can do an analytics project, and you can implement analytics. Both can be done badly or well.

Both categories have also seen changes since that article in 2015, although covering those fully would take, like, a book. Possibly Ian Graham’s upcoming one. More briefly, and less Premier League-winningly, let’s try summarising each in a paragraph.

The raw data (of all types) going into reports and models is of better quality in most cases. Possession value models (like expected threat or on-ball value) are now fairly widespread. There will have been far more research at the top end using tracking data, which lets you investigate space and team structure, and pitch control models are knocking about but not exactly common. Other research projects will have been embarked upon, and if an organisation started doing projects on career progression topics back then, they’ll now have a strong longitudinal dataset on those intriguing problems.

For implementation, there are simply far more people able (and being asked) to do the implementing. The packages that come with the data itself, from providers (or from/via competition organisers), have moved on, in some cases even merging datasets to make them easier to work with. This merging of datasets can also be done in third-party tools, but clubs are also building out more tech capabilities of their own. As Sarah Rudd alluded to, it can be sort of like the Apple ecosystem: you want to keep peoples’ data consumption streamlined and in-house (although for the organisation’s benefit, rather than your own profit).

As an example, Manchester City - sorry, City Football Group - are currently hiring for a Machine Learning Engineer who will, among other things, “research & build machine-learning models to identify and interpret complex patterns and structures in data and use this information to understand how teams play.” Liverpool have their famed department; Arsenal have theirs too, whose lower public spotlight presumably rests on silverware and media access. Some might say this is the real prize at stake in this season’s Premier League title race.

Little England

If Get Goalside has a fatal flaw it is being tragically English.

For a while, this was forgivable; even understandable. It was on the English-speaking blogs that expected goals entered the lexicon. (The English phrase, and English abbreviation ‘xG’, still gets used in other languages). North Americans featured heavily, but the data was all English (from Opta), with access to it often coming via English stat reference websites WhoScored and Squawka, and the app StatsZone which was associated with English magazine FourFourTwo at the time.

It’s not that Opta was the only data company around, even in England (as former employees of Prozone have previously been quick to write in about) but it was at the front of the public conversation, helped significantly by the annual Opta Pro Forum analytics conference.

Fast-forward to the present, and the week in which Opta’s latest Forum took place also saw the fourth edition of the Sports Data Forum in Seville and the third edition of the DFL’s SportsInnovation trade fair, in Düsseldorf. At the latter, the DFL announced a continuation/expansion of their partnership with Amazon Web Services (AWS), as their official generative AI provider.

On a similar theme, Major League Soccer launched a sports tech start-up incubator scheme - MLS Innovation Lab - at the start of the year (reminiscent of NBA’s Launchpad program and Tennis Australia’s AO Startups, to name just two).

AO Startups is possibly a better comparison to the DFL and MLS’s schemes than NBA Launchpad is. In those three non-basketball cases, the support for innovative ideas and companies benefits the sport (and sport) as a whole, but also aims to keep their event(s) ahead of the competition. MLS may be a closed shop itself, but it can’t regulate its outside competition (although it sometimes tries its luck).

The embrace of data and ‘analytics’ by league bodies doesn’t have to mean startups though. Current Bournemouth manager Andoni Iraola recently praised Mediacoach, a data platform developed by/in collaboration with LaLiga. This was a quote that Sportian - the new name for what was previously called LaLiga Tech - took note of, adding to their tweet “Is it any wonder LaLiga is producing the world’s top coaches?”. Back in Germany, the DFB announced in September that the 3. Liga and Frauen Bundesliga would be brought into their partnership with data company Sportec, which also includes a match analysis hub. It’s not exactly Apple but it’s a tech ecosystem; one wonders whether the NewCo for English women’s professional football could take inspiration from some of all this

And leagues aren’t the only entities giving a boost to start-ups. Arsenal, Barcelona, and Real Madrid (among others) all have or have had schemes of some sort, the latter currently in the beginnings of an Asia-specific accelerator programme. Clubs also come to types of agreements with young companies in other ways - Liverpool (them again) reportedly collaborated with tracking data company SkillCorner to help improve their product for a year prior to an official partnership. (SkillCorner are also French; more internationalism).

The point is, the modern football data industry is (proportionally) less ‘show me your math slide’ and more ‘move fast and break things’.

‘Moving fast and breaking things’-ball

You thought we were going to get to the end of this without mentioning Moneyball again?

In 2015, football was still searching for its version of the Oakland As. Then things got confusing, quickly. Leicester City won the Premier League and people tried to sell FC Midtjylland as football’s ‘Moneyball’ and neither really hit the spot. Liverpool won the Premier League and Champions League with the same Roberto Firmino that the Daily Mail article had maligned, but Liverpool were also too rich and storied to be ‘Moneyball’.

And now, nine years on, the three (rich) teams vying for the Premier League title may have the strongest data outfits in football. Manchester United are in on the act too. There’s still room to punch above one’s weight, but its mainly at the margins: promotion, Premier League midtable, mainland Europe. You could try winning a prestigious American tournament like the US Open Cup, if that’s your thing. (Sorry, being tragically English again).

The point is overdone for effect. The underdogs have less room for shocks if the big dogs are using data well. Moneyball (movie) has the line “if we try to play like the Yankees in here, we’ll lose to the Yankees out there” - whaddya do when the Yankees are taking inspiration from Moneyball?

The thing with ‘Moneyball’ is that, like ‘analytics’, it was always flattened and compressed until it fit into a buzzword. What we call ‘analytics’ is basically using data and maths in an analytical way. You’d be hard-pressed to find a reason why it couldn’t simply be folded into the umbrella of ‘sports science’; after all, rigour and evidence and testing is part of the scientific process.

And the thing with Moneyball is that it wasn’t about data per se. Get Goalside has argued in the past that the book is actually largely just about Billy Beane. What it’s about is questioning orthodoxies and finding edges, finding the most efficient way to get wins that you can.

So whaddya do when the Yankees are ‘doing analytics’ too? You’ve just gotta try and find another edge.