7 min read

Answering the hardest question in football

Ajax, Club Brugge, Hammarby IF, Forest Green Rovers, Toronto FC... if someone asked you to connect the dots here, would you know where to start?

Maybe it'd help if I added a few more: FC Midtjylland, Brentford, Brighton. Getting there? All of these are examples of why the hardest question in football isn't about players, isn't even about the pitch - it's about answering the damn (legitimate) query 'how do football clubs use data'.

Those eight names are all clubs who do stuff with data (or have done in the past) that is streets ahead of not just their immediate competition, but of significantly bigger clubs too. Yet there's very little that connects them on the surface. Sure, most are western European, but that's at least partly my awareness of public info*. Two share an owner, of course, and a third is also owned by a betting man, but none of the others fit that pattern.

*(In some cases that's press coverage, in some cases that's academic papers, in some cases it's job ads)

After a bit of thought, I think I've got a better answer than 'it depends' to the seemingly simple question 'how do football clubs use data?'.

Tough on Moneyball, tough on the causes of Moneyball

There are (at least) five factors in how a club uses data.

Two are covered by Moneyball: money and belief. The ol' tale of the Oakland As may have begun with a lack of dough, but it ends with the Boston Red Sox calling up Billy Beane and Paul DePodesta joining the LA Dodgers.

After those, there's aptitude. Converts to the Good Word of Expected Goals can go down very different paths depending on if they know what they're doing. Factor four is equally obvious: the length of time that a club's already been 'doing data' for.

And finally, business relationships. This can relate to money, but they can also be struck through the ambition to use data, or just by plain luck. Brentford, Midtjylland, and Brighton all had the connection of their owner's gambling company; Hammarby had a local academic; Toronto's ownership also owns the Toronto Raptors NBA team, so will have been courtside to that sport's analytics boom.

Ownership is also a factor for Toulouse FC, whose RedBird Capital majority owners had also, by the time they bought into the French club, invested in data consultancy Zelus Analytics. And you can't talk about French multi-club orgs without mentioning (almost literally) one-time employers of Savinho, Troyes AC, part of City Football Group, and the wider landscape of MCOs. Not every MCO has the same level of data set-up, and not every club within an MCO will have access to the same knowledge and tech, but it's not not a factor.

Cutting down the nuance

Five factors, though, is too many to talk about comfortably. At a push, they can be boiled down into two: resources and ideas. Money, time, and business relationships are all resources, and ideas covers ambition and aptitude (aiming for the stars when you can't build a rocket is still a bad idea even if it's admirable).

You can put all that into a grid, like this:

A 3x3 grid, with ideas on one axis and resources on the other, each segmented into 'low', 'medium', and 'high'

But even this can be simplified further.

There's an extent to which clubs with great ideas on low budgets will be landing on similar things as the clubs with more resources and less ambition.

The 3x3 grid with diagonal squares highlighted, e.g. High Ideas, Low Resources with Medium Ideas, Medium Resources with Low Ideas, High Resources

With this in mind, even if clubs don't have much budget they can increase their resource in other ways, through strategic relationships or just by starting early (although, granted, that last one easier said than done than anything else). But even with a cap on resources, better ideas can punch up a weight class or two.

But how, then, are football clubs within these tiers using data?

The tiers

(This is the part of the newsletter that educated dissenters are obliged to get in touch about. It's (obviously) not going to be exhaustive, but it gives a feel of how data can be used)

Tier 5 (low resource, low ideas)

In many professional leagues, these clubs will still be using data of some form. But it'd be in ways like checking the goal tallies of prospective attacking transfers or clean sheets of defensive transfers; and probably collecting running and gym data but not doing much with it.

Tier 4 (Medium resource & low ideas/low resource & medium ideas)

What does 'medium resources' mean, in the grand scheme of global football? Eyeballing a couple of sources (clubelo, UEFA coefficients) just to be able to point to a ballpark, let's say teams like those in the upper reaches of Scottish, Croatian, Swedish, Czech men's top-flights. That does mean that 'low' resources has a very long tail, but that's sort of the way the industry cookie crumbles.

This tier probably has processes for feedback around scouting and match analysis, which will involve data in some form, but the processes might not be settled and with very limited data available. They'll probably have access to a data company's reports or platforms, but the ceiling on their ideas or resource stops them getting something better or more bespoke than that.

The low resource (medium ideas) clubs in this tier are probably upwardly mobile as long as money lasts, while the medium resource (low ideas) clubs are probably gonna be falling off if they see a drop in revenue.

Tier 3 (High resource & low ideas/medium-medium/low resource & high ideas)

By nature, this is the broadest tier, and therefore the most varied. All the clubs in this tier will have standard processes and reports, with a defined sense of how these should look. That sense will likely be driven either by industry knowledge absorbed by osmosis or (at the lower-resource clubs) a strong, 'first principles' set of ideas about what these should cover.

In this tier, the high-resource clubs can brute-force their way through a lot of things. For a long time, managers used data even if they professed not to: video analysts would tag and clip matches according to key moments or tactical concepts that the coach believed were important. Throwing a ton of video analysts at a problem can - for some things - get you the type of analysis that a more sophisticated data set-up could get you far quicker.

On top of that, through either money or relationships the higher-resource side will almost certainly have a bunch of 'normal' data on-hand. It may well be looked at, but that doesn't mean good analysis is done with it.

Clubs at this tier probably have a knowledge bank too, although they'll call it different things. A database of scout reports would be a form of knowledge bank, but low-resource clubs might enter this tier by building their bank with public research that they synthesis into internal documents. Even if you can't afford to do research in-house, that doesn't mean you can't benefit from research findings.

Tier 4 (High resource & medium ideas/medium resource & high ideas)

Now we're reaching the tasty part of the grid.

Clubs in this tier - even though they may look very different on the outside - will all have some kind of expertise in analytics. Reports will almost certainly have components which implicitly or explicitly touch on things like the coach's tactical game model, player advantage battles, the chain through which chances get created.

Whether through availability (high-resource) or smart prioritisation (medium-resource), clubs at this tier will be using tracking data for something. There are a bunch of competitions which offer tracking data to its competitors, which can feasibly be used for tactical analysis; while TV broadcast-based tracking data can be used for scouting.

Clubs at this tier will also probably be doing bits of their own research too, which links to the tracking data. It seems to be a not uncommon use-case of broadcast tracking data to use it for running data while scouting. It's always useful to know how statistics of a player might be altered by moving leagues or teams (the classic: do Eredivisie shot monsters keep taking lots of shots elsewhere), and this applies to physical data too. You can't just rely on Skillcorner blogs for your knowledge here, interesting as they are.

Squad-planning and player development will also have data involvement at this tier too. Clubs will be able to benchmark players against particular goals and against their peers. This can sometimes be done without a huge amount of resource (it can often come directly from a data provider's platform nowadays), but for player dev in particular you'd ideally have a long history of data.

Oh, and this tier will have a good data engineering set-up, which underpins everything else.

Tier 5 (the rich nerds)

idk, someone bug the [redacted], [redacted], and Liverpool offices.

But seriously.

This is the one tier that can feasibly be doing work with body pose/skeletal data. That doesn't mean that they definitely are, but if they're not I'd like a word with them - I mean, come on, why would you not be doing the cool stuff?

Clubs in this tier will probably have a substantial internal base of knowledge, drawn from:

  • internal research
  • public research
  • general insight of their club personnel (who, at this tier, are plentiful and/or experienced)

A bunch of that internal research will be on things that underpin the regular decisions football clubs make. (e.g. strength of leagues relative to cost of players; tactical trends; fatigue patterns in different in-game circumstances; possession value models).

The processes that the team has (including but not limited to their scout reports, match reports, transfer decision-making) will probably have elements that are backed up by real research. That could be the relationship between specific metrics and specific goals, or about how people absorb and interpret information.

There will likely be something that clubs in this tier do really, really, really well too, the fruits of a previous project. But maybe only a couple of things - being in this tier certainly doesn't mean that a club will be world-leading across the board. The thing with research is that outcomes are unknown, so it's a bit of a law of averages that you need to do a few projects to get just one that really shifts the needle.

But although we might expect there to be a heavy focus on tactics, that research could easily be done on other things. It feels like a safe bet to say that clubs in this tier will have some fancy recommendations about load management, given that 1) they'll be paying players a lot 2) it's easier to measure than tactics (monitoring a single player vs monitoring the interactions of many players). (It's also probably an easier analytics sell to managers who are already ceding control over transfers).

--

The reason for putting this together is that data usage is not like player quality: the ties between revenue and output are far looser. It's difficult, then, to say 'clubs in X league do this' or 'clubs with Y wealth do that'. It's still not particularly easy to say 'clubs in tier 3...', but it's truer to what the current landscape of football is like.

TL;DR - It depends.