"If God had wanted us to play football in the clouds, he'd have put grass up there." — Brian Clough
"If God had wanted us to collect passing data, he'd have put abacuses in the press box." — Mark Thompson
A lot of people have spent a lot of time asking why it's taken so long for football to 'have its Moneyball moment'. I have two theories:
- Michael Lewis never spent a year embedded in Bolton
- Data might have been good quality for a long time, but it wasn't cheap and it wasn't free
The foundational stone in the Moneyball mythology is Bill James, the author of the Bill James Baseball Abstracts, who spent night shift after night shift poring over all the data he could get his hands on. He was using box scores which, although not perfect or extensive, gave quite a lot of information to work with, all of which was publicly printed in newspapers. In football at that time, the data-sceptic's maxim that 'the most important statistic is the scoreline' was almost inarguable, if for no other reason than it was the only one available. Say the phrase 'box score' in 1970s England and people would probably think you meant a successful shot in the penalty area.
That's not to say that there was no, or had never been, any decent football data around. By the time that James was putting together his first Abstract, an Englishman called Charles Reep had been providing data analysis to football teams for two decades (I've written about Reep previously). And in Hungary, newspaper Nemzeti Sport had been publishing data visualisations for decades before even Reep came along. As is so often the case, there's nothing new under the sun.
But their data wasn't easy to collect. Reep noted the number of passes in sequences of play, as well as zones that they started and ended in. Nemzeti Sport published what we'd now call momentum charts, showing what looks like teams' progression up-field throughout the match. Matches which, by nature of the sport, are broken up by pauses only at random intervals of varying length. Baseball - like cricket, American football, even basketball to a degree - are almost designed for amateur analysts. One team does something, the other team perhaps does something in response, and then we all pause while we fill in our stat cards.
There's nothing 'special' about box scores. They're a kind of Darwinian representation of the sport; a distillation of what seems important, what readers find interesting, and what's cost-effective to collect. As ground-breaking as Nemzeti Sport visualisations may have been, as intriguing as Reep's data, as rich as the tracking data of the early 2000s, none of it was that.
What Reep and Nemzeti Sport's work does do, however, is give us a way of thinking about what a mid-20th century data collection might have looked like without being too influenced by what we have in the present. It wouldn't have simply been a streamlined Opta collection process.
Both hit on the idea of zones that a team got the ball to. Both hit on the idea of shot locations. (Again, call 'em possession value and expected goals precursors and you're back to 'nothing new under the sun' territory).
We've got to keep in mind that, to go back in our TARDIS and gift football some mid-century box scores, things need to be simple enough for someone sat at a wooden desk, with milk bottle glasses and a flat cap, to note throughout a match. There are no abacuses in the press box. And that means, as much as Reep and a lot of 21st-century analytics latched onto them, we're jettisoning passes.
I say this partly in the hope that, by writing it, I can manifest into the universe an end to people using the phrase 'tiki-taka' in relation to Pep Guardiola's teams (some people still do it!). But mostly it's because the act of collecting pass data is pretty heavy-duty, and would be even pre-positional play and pre-modern pitch maintenance techniques. Get rid of pass stats. We don't need 'em.
So you're sat in the stands on a cold Tuesday night in Stoke, your warm winter flat cap on your head and a cup of Bovril by your side, and you're the stat collector for the match. You're collecting shots for sure, and some other newspaper has probably claimed a boost in sales after starting to separate shot stats into a couple of basic zones. Outside vs inside the box is an obvious candidate, but it wouldn't be a surprise if they'd come up with some kind of fancy 'danger zone' too, like the width of the six-yard box up to the penalty spot. Three zones is quite enough to be dealing with though.
Shots look quite bare on the page on their own though. You once heard a story that an editor, one of the ones up north, had gone into a rage some years ago because George Best wasn't at the top of any lists. The paper spent half a season paying kids half a penny a game at Old Trafford to keep count of various things he did. After Christmas they were rushing to the evening papers instead, to see if anyone in the league had got more 'forward advances' than him (some kind of basic 'progressive passes plus carries' metric, maybe starting in the centre-circle strip of the pitch and ending roughly in the final quarter of the field).
This all seems manageable - plenty of time to rest and sip your Bovril - so why not take an opportunity to try and crowbar something defensive-based into the mix. If George Best is being referenced in this data collection fanfic then that means Bobby Moore was around too, so it's not inconceivable that there'd have been a real appetite for some defensive statistics. That'd probably end up as some kind of clearance/ball recovery stat. But what about if there'd been some one, or some team, who'd really got a handle on defending Best at source?
In your winter-weather flat cap in Stoke you ready a separate scrap of paper besides your regulation stat sheet. Your grandparents were regulars watching the team; they always talked about the only bad game they saw Stanley Matthews ever play, when he just kept getting turned back where he'd come from. You start keeping a tally of 'turn backs'. Your editor loves it. The punters love it. The crowds start cheering turn backs like goals, betting booths at grounds start offering odds on it, and when you retire some young journalist tracks you down and writes a very pleasant two-and-a-quarter pages about you in their book on the history of something called 'analytics'. It's the best-selling book of 1996.
The following year, across the Atlantic, the new general manager of the Oakland Athletics carries a copy of the book into work on his first day in charge.
There's another starting point that this 'what if' adventure could take, still on the theme of taking the importance of passing stats down a peg or two.
The story of football analytics history always has the start of football analytics blogging as a key moment; for some histories, it's practically 0 AD. Part of this is because it led to, or at least correlated strongly with, an explosion of figures working with data; part of it is just that the way we construct history tends to focus on documentary evidence. There's nothing history loves more than bloggers.
The start of this starting point is around 2009 and 2010, with another peak around 2013 and 2014 before things took off for good. At the time, the dominant teams in men's football were Barcelona (on the club stage) and Spain (on the international stage). Vicente del Bosque and Pep Guardiola possession-ball was the way to play and, by the time of that second surge in analytics bloggers, had been for half a decade. Crucially, it had also been the way to play ever since the beginning of the more detailed Opta datasets that these analytics bloggers could get their hands on.
Is it a surprise, then, that through-balls and cut-backs were identified as optimal strategies, while long balls and crosses were often dismissed?
What if blogging had taken off earlier? What if we cared less about pass data not for reasons of cheap data collection (although, still: StatsBomb, Opta, Wyscout - think about it) but because that wasn't how the good teams won games?
Sam Allardyce's Bolton Wanderers, alluded to at the very start of this post, are the obvious team of interest in this decade 'pre-blogging'. While he later became something of a pantomime villain, more meme than man, long-ball connoisseur Allardyce loved data. As recently outlined in Rory Smith's book Expected Goals:
"Two years before Billy Beane's epiphany in Oakland started to transform baseball and a decade before football clubs started investing heavily in their data departments, Allardyce was writing almost the exact same story in strikingly similar circumstances, distilling what he had found in the data into a set of unorthodox principles that defined how his team played."
Bolton - little old Bolton who had spent just four years in the top-flight between 1964 and their promotion under Allardyce in 2001 - finished between sixth and eighth every season between 2003/04 and 2006/07. Had football blogging hit 2014-levels ten years earlier this is the story that it would have been looking into.
And it's not like Bolton was the end of this story. That period was also the rise of José Mourinho; it had Rafael Benítez - not exactly an exponent of Joga Bonito or Juego de Posición - reaching two Champions League finals in three years, winning one of them.
What if, instead of automatically looking for Lionel Messi at the top of every new attacking metric list, early analytics was looking for Ronaldinho? What if the midfielders to look to as easy examples of 'best in the world' weren't pass-extraordinaires Xavi and Andrés Iniesta, but Frank Lampard and Steven Gerrard?
Early analytics work pointed out that things like long balls and corners were inefficient ways of scoring. This was, by the numbers, true, even if it wasn't the whole story. But at the time that the growing football analytics community was taking its toddler steps, there was little to force them to seriously confront this fact. Even teams like Swansea City were getting (relative) success by copying the Guardiola template.
We, the masses, are getting around to it now though, somewhat belatedly. As the epilogue of Expected Goals points out, the 'analyticsy' teams of the moment - like Liverpool, Brentford - have been embracing set-pieces, once thought a little bit old-fashioned, a little bit, well... Allardyce: "The very inefficiency of set-pieces made them the game's juiciest low-hanging fruit."
Our environment didn't stop shaping us.
It's little surprise - to me, in hindsight, at least - that one of the big unique features of StatsBomb's data provision when it launched in 2018 was pressures and counterpressures. There was a lot of other stuff that they pointed to, but pressures was the big thing. And it made sense, given that ten outfield players are defending at any one time and existing defensive data was very sparse.
But what was the big tactical trend that took place between the possession football era of 2010 and StatsBomb's data launch eight years later? What was 2015-2018's buzzword answer to 'tiki-taka'?
"And language is also, literally, the "containment". The terms we choose - or the terms we are offered - behave as containers for our ideas, necessarily shaping and determining the form of what it is we think, or think we think." — Zadie Smith