I forgot to plug this in the past couple of weeks but my colleagues at Twenty3, Daniel Girela and David Perdomo Meza, were at the Barcelona Analytics Conference with a kickass poster featuring some of the cool work we/they are doing there.
I’m taking the ‘standing on the shoulders of [Arsenal-associated] giants’ approach this week, with the post heavily built around two extracts from David Ornstein articles on The Athletic from the past few days.
The first is one of the less-exciting paragraphs from his and James McNicholas’ piece on the demise of Unai Emery at Arsenal (emphasis added).
That month, it emerged that Arsenal’s head of recruitment Sven Mislintat would be leaving the club. The German had been assured he would graduate to the role of technical director after Wenger’s departure, but with Raul Sanllehi staging something of an executive coup, Mislintat found himself sidelined.
The second comes from his Monday column, rounding up some of the juiciest bits of goss from around the grounds in England. The below comes from a section titled ‘Rogue companies turning search for marginal gains into a minefield’.
For example, one Premier League club recently wrote off an annual six-figure sum they were paying to a firm for performance data that was eventually found to be riddled with mistakes.
The partnership had been running for several years before the club decided to recruit specialists to thoroughly scrutinise the information being provided. The feedback was damning.
They alerted the company to the problems and demanded answers, but all that came back was a “thank you” for raising the concerns, a pledge to implement improvements and a reference to the small print of the contract, which essentially stated there is a margin for error and no recourse.
And, alongside these two extracts, Unai Emery’s hiring and firing itself can be chucked into the mix as we ask: how can we trust people in football?
These three instances — executive coup, cowboy company, and unsuccessful manager — aren’t exactly the same thing, of course, but they do share commonalities. The coup and the company are clearly in the game for their own benefit at the, seeming dispassionate, expense of others. The company and Emery are both outsiders who win their way into a club by promising something that they fail, albeit for different reasons, to deliver — if the clubs in question had had more knowledge about these figures in the first place, maybe they’d have never been hired.
How does a club, or an individual within a club, know who to trust when backstabbing, underperformance, and downright grifting are all around?
I suppose that this is why so many people hire people they know: you might be able to find someone better outside of your network of knowledge, but you might also find someone worse who just interviews well.
This isn’t anything unique to football of course, but what is a more unique problem is not having that network, not knowing who to ask, or who to ask who to ask, about who to hire.
(Presumably, this is why the performance data company were able to get away with dodgy data with a Premier League club. Data of all types is still a relatively new field within the professional game, and so how are you supposed to know who to trust?)
Leicester City’s Head of Performance Innovation, Paul Balsom, spoke at the recent Training Ground Guru conference about how the club came to hire their Head of Analytics Mladen Sormaz. Balsom was advised on what to look for in the Head of Analytics role and not to rush the eventual candidate when they first joined the club, as the first few months would have to be devoted to sorting out the data engineering and infrastructure.
That’s sound advice. I wonder how many Premier League clubs are seeking advice like this, not just about who to hire but about the entire concept of using data within football. I also wonder how many have the kind of networks where they can get advice from someone knowledgeable and trustworthy in the first place. Having been in and around the football stats sphere for about six years now, I’ve seen a number of job descriptions for different types of data-related roles at clubs and they have differered significantly, so it’s clear that not everyone is doing what Balsom did.
Now. I’d written the above on Sunday evening, with the aim of tying it to a neat close on Monday. But then on Monday morning, StatsBomb CEO Ted Knutson decided to give stats twitter a good ol’ shake to wake them up for the week. [As I know tweets don’t always show up right when embedded online, for the avoidance of any doubt Knutson is quote-tweeting a tweet asking people for ‘a thing that everyone in your field knows and nobody talks about because it would lead to general chaos’].
It should be noted that, as the CEO of a data provider himself, Knutson isn’t without a dog in this fight. That said, I more or less knew the first tweet to be true and I trust that he’s telling the truth on the second.
These are similar problems to that in the extract from David Ornstein’s article: data companies providing data with flaws. The scale of these flaws differ hugely both in severity and how widespread within the data they are, but flaws are flaws.
(Although it should be noted that the company that made potential errors of judgement several years ago is more likely to have learnt from it and have consistent data now than the company who seemingly under-collected shots last season. Companies will also respond pretty promptly to any problems that you do find that have slipped through the net. I should also say, in this paragraph that’s rapidly becoming a pre-emptive fire extinguisher for any bridges I’m close to burning, that issues in data is a bit like refereeing errors — sure, the big ones get a hell of a lot of attention, and deserve to, but the vast, vast majority of the data is absolutely fine).
You don’t need me to elaborate on why ‘can we actually trust the data’ is a worrying sentence for football clubs. I also don’t want to piss off the data companies more than I already have by dwelling on it, BUT Knutson logged back on later in the day to do it for me [the whole tweet, here, wasn’t relevant so I’ve quoted the important bit]:
The more the professional football world moves toward possession and ball progression value models, the more they care about tidy possession sequences and who has possession of the ball where and when.
I’ve written previously on the topic of ball progression-type models here. To nick part of that post as a quick intro to the concept:
[These types of models] have generally sought to measure how much value players further back in the goal-scoring process (ie, midfielders or defenders) add to the chances of goal-scoring. […]
Across the board, these models have shown that passes in midfield have little (if any, really) direct value in scoring goals. I have some quibbles about the fact that midfield is very dynamic and questions about whether the models capture this […] however in terms of the descriptive value of the model I think that it’s valuable, and true-to-life, to say something like ‘by and large, taking an average of events in this area, actions in midfield have little direct impact on scoring goals’.
The mess of midfield and fine margins in the impact that actions appear to have is why the people doing the modelling need the very best data.
So. On one side of the equation we have data companies who have accuracy and/or consistency issues; on the other, a cadre of data scientists needing accurate data that they can rely on for their models to be trustworthy. Not only do they need it to be accurate for the models to be right, but the problem with modelling is that it can get pretty tough to troubleshoot and work out why something’s gone wrong, or even if it’s gone wrong in the first place.
And it’s not just the big data companies who might give people trust issues.
[Note: QA = Quality Assurance]
Knutson’s right, clubs collect a bunch of their own data with an army (of varying size) of young analysts. They can be noting things that data companies don’t collect, or more subjective information that’s tough to construct from the data feeds they’re buying.
As a football club, though, I imagine that you don’t really want to spend much time on reading up on, and enacting, quality assurance practices, which should be part of the benefit of buying data in from a provider: it’s their job to collect accurate, reliable, consistent data so they should be the ones who have quality control down to a T.
Yet, seemingly, they don’t. And if you’re a potential customer, it seems unlikely that the provider is going to tell you about their issues, particularly given the response that the performance data company from The Athletic article gave when confronted with their own dodgy data.
Who do you talk to to find out about which data company is most accurate? *shrugs*.
And don’t even get me started on the rash of companies popping up offering to turn video footage into tracking data. That, dear reader, is a conversation for another time.
This post wasn’t meant to be a write-up of the problems that data companies can have, though, it was about trust and how clubs are meant to establish it. It’s clearly tricky terrain with few, if any, maps to guide you. To discover who to trust, it seems you first need to know who to trust.
On the Twitter-sphere, and particularly the stats twitter-sphere, we can be quick to snark at the old men (always men, of course) who’ve been in the game for several decades and whose large contact books keep them in work. But, on the Wild West of the data frontier, it’s the same sort of people, who know everyone’s secrets and everyone’s strengths, who could be some of the most valuable.