The analytics revolution is history now
There’s only three ways to tell history.
There’s ‘events, dear boy, events’ history, a focus on ‘important’ figures and/or ‘important’ events. There’s history of ideas, whether that be political or economic or societal. And there’s history of ‘the people’, those who might not have led armies or made medical breakthroughs, but who existed and had the same richness of life in large, large, often anonymous, numbers.
In the span of the past few months, three works which could broadly be described as ‘football analytics history’ have dropped into our laps. And although they don’t fit into those categories perfectly, they’re conveniently close. This post will be part-review, part-Get Goalsideian amble through the countryside of the topic, because I realised while drafting it that I’m simply too close to the subject matter to write a straight ‘review’. (For something more like that, I recommend reading Grace Robertson’s piece on the two books we’ll be discussing).
Expected Goals* by Rory Smith is in the first of those three types of history: a broadly linear telling of ‘modern’ football analytics, starting with English company Prozone in the 1990s. Along the way it hits characters that will mostly, but not necessarily all, be familiar to analytics followers: turn of the century Opta, Sam Allardyce’s Bolton, Decision Technology, StatDNA, the titular metric expected goals, FC Midtjylland/Brentford, Liverpool. Don’t worry, Charles Reep gets a mention too.
*(subtitle: The story of how data conquered football and changed the game forever)
Net Gains* by Ryan O’Hanlon is far less linear, and while it still focuses on individuals it’s far more focused on the now. The now isn’t necessarily the ‘present’, but the minds and the intellectual frisson that is shaping and has shaped recent football analytics. It also seems to be more interested in the subject of its subjects too: trying to understand football as a sport. There’s a passage late in the book where O’Hanlon recaps what the reader has learnt from the book as if it were being applied to an imaginary club, but the introductions of both works clearly shows the different tacks the authors take on this general topic.
*(subtitle: Inside the Beautiful Game’s analytics revolution)
Expected Goals’ prologue opens with a data collector in Manila before talking about the way clubs’ attitudes to data have, often quite quietly, changed. It’s about the companies and the boardrooms. Net Gains’ introduction is more of a personal biography. It’s a contention of mine that Moneyball is not a book about baseball analytics but a book about Billy Beane, and one could similarly say the introduction to Net Gains isn’t an introduction to a football analytics book, but an 11-page (ebook, font-size dependent) dedication to a father and a childhood.
The books aim at different things. Expected Goals is ‘how did club football get to this point of analytics adoption’; Net Gains is ‘how come we don’t understand football, and who are the people trying to work it out’.
John Muller and (the pseudonymous) @TiotalFootball’s Post Script podcast, meanwhile, is intentionally in the ‘the people’ category of history, focusing on a specific subset: bloggers. Its characters of interest overlap with Expected Goals and Net Gains only in as much as they happen to have been people who wrote about football stats online. Sarah Rudd (former StatDNA and Arsenal) and Chris Anderson (a key figure in Expected Goals) formed episode one; Howard Hamilton (whose appearance in the books is only a passing mention in Net Gains), episode two; Ian Graham (Liverpool) the bulk of episode three.
The project spawned from Tiotal’s attempt to draw the history of analytics and tactics blogging together, as the two areas had sometimes been seen separately and in opposition to each other, but grew from there. Of the three works in this newsletter, it’s the closest to a work of historical study: the primary sources are old blog posts, largely uncovered via painstaking use of internet archive services; Muller and Tiotal are the historians, placing them within a wider context, drawing out points of interest, and interpreting thoughts of the source’s authors.
Cards on the table, I really enjoyed Net Gains, which aligns so closely to my particular interests that I’m featured, very briefly, in the chapter about Charles Reep. I enjoyed Expected Goals less, but would probably recommend it to friends or family members to understand what I write about. The Post Script podcast is so far up my alley it’s on my doorstep, but is probably a podcast about analytics bloggers for analytics bloggers.
What I think is so interesting about them, and the fact they’ve come out around the same time, is that they complement each other in very interesting ways.
“And then, you and I know [that] coaches and front-office executives, sporting directors, are some of the most online motherfuckers on Earth. And they read this shit, and they always have.” – Tiotal Football, Post Script podcast (episode 1).
Not only do the three works start off in different ‘types’ of history-telling, they have markedly different energies to them too.
Talking about Expected Goals on a ‘Totally Football Show’ podcast appearance shortly after its publication, Smith gave an interesting sorta-mission statement of the book: “Behind the scenes there have been lots of people who’ve done a lot of stuff to change football, when football didn’t really want to be changed, and I think they have changed it far more than they recognise.”
The reason why I find this interesting enough to quote is because much of the book is focused on people who, if not failed, certainly didn’t achieve what they set out to. The clearest example of this is the running thread following Chris Anderson, which is the focus of alternating chapters.
Anderson starts out as a professor at Cornell University (one of my favourite lines in the book: “Moneyball was the perfect light read for a behavioural economist”) and then starts blogging. He attends analytics conferences, co-writes a book (The Numbers Game, edited by Smith), and tries to implement an analytics revolution at a club, any club, through increasingly (though necessarily) ambitious means. The ‘Chris Anderson’ chapter titles feel telling: Trojan Horse; Proof Point; How (Not) to Buy a Football Club; Alien.
And then there’s the comparison between Expected Goals and Net Gains when covering Anderson’s Post Script episode co-focus Sarah Rudd. Both books feature interviews with her, and though there are varying degrees of positivity in her section of Net Gains, there’s nothing close to these lines from Expected Goals: “Still, though, Rudd is a little rueful at what might have been. ‘We were not as successful as we could have been, given how far ahead we were.”.
This could be read as a pessimistic way to approach football analytics history, but in light of Smith’s words on the ‘Totally Football Show’ I think it reads as quite supportive of the characters that Expected Goals features. They were people who were inspired by Moneyball but couldn’t get Moneyball to work (“He [Anderson] could not become Billy Beane in the West Midlands” is a line in a later chapter). I’m not sure that the book really shows how they changed football, or even particularly proves that they did, but it seems to want to show that their work, Anderson’s in particular, wasn’t in vain.
“Marc Andreessen[…]has a dictum that, in his business, ‘being early is the same as being wrong’. The timing of technology is as vital to its success as the substance of it: the world has to be ready to accept and embrace your idea.” – Expected Goals (chapter 3)
What this perhaps quite British sensibility of Expected Goals also does is something that I think any ‘history’ of analytics leading up to the present moment needs to: engage with the question of ‘why now?’ and, by implication, ‘why not then?’. Were the ideas not right, the implementation not right, the conditions not right? Could StatDNA’s work under Wenger, for example, had the same impact as Liverpool’s current research department appears to have done if circumstances in the late 00s have been different?
I don’t think any of the works set out to answer this, but it permeates them. The early episodes of Post Script talk about struggles of the bloggers to find data; Expected Goals details the evolution of data collection, eventually moving past the need to physically drive discs of footage around England. It opens, as previously mentioned, with Ashley Flores, part-time pro-footballer and part-time data collector for Impect, the German company that popularised a player-bypassing metric ‘packing’.
Although the pay is, Smith writes, “good, way above Filipino minimum wage”, you assume that it’s far lower than would be required in Germany, which’ll be why the collection takes place in the Philippines (or, for companies like StatDNA, Laos or Cambodia; for StatsBomb, Egypt). Imagine what that would have required in Charles Reep’s day of the 1950s. Meanwhile, key characters of Net Gains like Luke Bornn and Paul Power have done fantastic work with tracking data, something which has only been made possible, on a mass scale certainly, with technological advances this century.
As someone whose interest in analytics tends to be towards ‘what’s next’, this ‘why now’ questions sticks out to me particularly. The characters in these works are clearly very smart, and building on the work of others, but, for example, if Prozone have had tracking data since the 1990s why did it take until the mid-to-late 2010s for pitch control models to emerge as (apparently) new ideas, and can we learn anything from that which might improve further development and implementation?
What can we learn from these histories that helps to shape a better future?
“All interpretations made by a scientist are hypotheses, and all hypotheses are tentative” – evolutionary biologist Ernst Mayr, quoted in Net Gains (chapter 6)
It might be because the book isn’t aimed at me, but I was mildly disappointed that the main beats of Expected Goals, outside a couple of the Anderson chapters, felt so familiar. Net Gains speaks to and about a slightly less well-known, but more Online, set of people. However, it feels like a smart and non-obvious choice to feature Paul Power in a fairly prominent role, someone who was doing advanced work with tracking data early on and who, to my knowledge, hasn’t been profiled or done a bunch of interviews before.
However, all three of these works have a very Anglosphere-centric focus. Maybe it’s a shared language thing, maybe it’s who is willing to talk, maybe it’s that these are the only people whose work has mattered. Some of these works elicited the following more than others, but it’s unnerving to me to suspect that there are notable gaps in my analytics history knowledge and to consume works which looks at that suspicion and goes ‘nah, I think you’re good’.
The only non-British, Irish, or North American contributions featured, that I can recall, are Impect and a mention of the Israeli scientists who created SportVU, the tracking data system which helped Luke Bornn’s entry into sports analytics (through the NBA, rather than through football). Maybe these are the only notable stories to tell, maybe Moneyball’s Anglosphere legacy is larger than we thought, but, for example, what if any legacy does Nemzeti Sport’s early twentieth century data visualisations have in Hungary? Skillcorner, one of the companies making waves in broadcast footage tracking data (where Paul Power now works) are French – is there an analytics history to explore there?
It seems worth mentioning at this point that Expected Goals and, more thematically similarly, Net Gains join Christoph Biermann’s book Football Hackers, originally published in Germany in 2018, on the ‘good analytics movement history books’ bookshelf. The Post Script podcast ‘limiting’ its remit specifically to ‘analytics blogging’ is a smart one in this context, particularly as a side-project; it sets out an explicitly smaller area to cover.
“The sheer quantity of brain power that hurled itself voluntarily and quixotically into the search for new baseball knowledge was either exhilarating or depressing, depending on how you felt about baseball.” – Moneyball (chapter 4)
Let me bring in Grace Robertson’s review of Expected Goals and Net Gains again here, because there’s something she writes at the end of it that crystalises something that was half-formed in the back of my mind.
“Let’s say that in ten years’ time, every club incorporates analytics into all their decisions, while using proprietary models far more advanced than anything we can see publicly. At that point, none of us will be able to know what good decision making and good strategy looks like. We’ll have come full circle [from the pre-blogging era] and we will understand football even less than we ever did before, because the astrophysicists have figured out all the things we will never know.”
I’m a little less worried, for a reason that O’Hanlon, and Luke Bornn, nod to at one point: “’People who are analysts have very clear incentives to say everyone should be using data because they want to grow their space’, he [Bornn] said. ‘They want to sell more product, they want to make themselves more hireable’.” Few things make you look more valuable than making advanced analytics sound simple in a popular book, or explaining a smart thing you’ve been doing (albeit that you were doing a couple of years ago, perhaps).
On this theme of secrecy and what is made public, I find it admirable that, in this passage on the incentive structures of analysts and companies to big up their achievements, O’Hanlon acknowledges these apply to the writers too. This newsletter has incentives too, although uncertain ones.
I sometimes worry, though, that we (by 'we' I probably just mean 'I') rely too much on the blogosphere in covering and identifying analytics. It doesn’t feel feasible to assume that, even in its heyday, it represented the full nuance of analytical knowledge and application. (That said, blogs should make a return. Blogs are good).
While editing this post I leafed through my copy of The Numbers Game. Although Anderson, who co-wrote it, had been a blogger, he was kind of in the 'early' section that feels slightly separate from what came later; before the famous Opta expected goals blog by Sam Green, before the StatsBomb blog took off in a big way. The closing chapter of the book features forecasts, one of which is that ‘Geometry – space, vectors, triangles and dynamic lattices – will be the focus of many analytical advances’. How smart would I have looked in the 'early analytics Twitter' era of blogging if I’d just repeated that over and over again?
Robertson’s right though; both the development of analytics knowledge and the telling of analytics history is impacted by the incentives that practitioners have in keeping their insights and edges (and screw-ups) secret. This isn’t like ‘regular’ science and technology either, where there are specialist reporters covering the field. There are no analytics beat reporters. (Regular journalists will find analytics staff as useful sources, but it seems doubtful that they’ll be as interested in the 1s and 0s as someone who’s analytics-specific).
“Still less well-known, at 65, than many far less influential managers, Bielsa is something like Velvet Underground of soccer coaches: Not many people buy his records, but everyone who does subs in an attacking midfielder.” – Brian Philipps, ‘Marcelo Bielsa and Leeds United Form a Perfect Union, The Ringer (2020)
If you’ve stuck with this post this far, well done. We’re nearly home.
There’s a chapter in Net Gains that I found unexpectedly touching. Granted, it’s the one that O’Hanlon contacted me about to contribute to, the one on Charles Reep, me having written a Get Goalside newsletter about him. If I’d had to bet on it though, I wouldn’t have guessed that book chapter about a mid-century accountant would be emotive.
In it, O’Hanlon speaks to Richard Pollard, who was a friend and collaborator of Reep’s in the later years of his life. “’He [Reep] was made fun of more than anything else,’ Pollard said. ‘And a lot of coaches used to downgrade him all the time. Weird, weird old retired wing commander with a hat and a pencil and paper.’” [original italics]
Let’s be honest, if you were stood on an English terrace watching an evening game in the 1950s or 60s and a middle-aged man a couple of rows in front of you put on a miner’s helmet, torch on, and got a notebook out… you’d have thought he was a little weird; you may well have made fun of him too, loudly or quietly. Not a thing to be proud of, but often happens.
In the hours after Robertson’s piece came out, ‘analytics twitter’, or fringes of it, came alight. People who’d been around a decade or more shared names they thought had been overlooked, people whose contribution they thought deserved recognition. Many had, at the time, been shunned or made fun of – if not by coaches this time then certainly by others online, even, in some cases, members of the media.
It turns out that the 1997 paper that my old post on Reep opened with, which featured a diagram that looks a lot like modern expected goals probability charts, wasn’t co-written by him at all. I’d sort of suspected as much, given that he was in his 90s by that time, but I still hadn’t known for sure.
“Reep,” O’Hanlon writes, of the man who’d spent matches scribbling notes and producing data on the game he loved and was fascinated by, “had nothing to do with the production of the paper; Pollard just included his name as a co-author as a tribute to his mentor.”
If you want a ‘history’ of analytics then we’re probably still too close to the turning point to properly recognise its shape. That doesn’t mean we shouldn’t try, but probably makes it more likely that things will end up feeling unsatisfactory.
Given this, the stories we choose to tell tend to be personal in some way, and I think all three works show this in their own ways. What interests us? What inspires us? Whose tale do we want to commemorate? Not so different to Pollard: whose name would we quietly add, in a kind of tributory fraudulence, as a co-author to our own research paper?