Manchester United says 'hello world' to data science

With news that Manchester United are hiring data scientists, Get Goalside looks at the directions a clean slate at a rich club could take you

"Sometimes you have a noisy neighbour. You cannot do anything about that." – Sir Alex Ferguson (2009), speaking about newly Abu Dhabi group-owned Manchester City.

"Sometimes you have a noisy neighbour. You cannot do anything about that." – City Football Group's Data Insights Slack channel (2022), speaking about newly 'hiring for data scientists' Manchester United. Probably. Maybe.

Manchester United are hiring data scientists. Yep, the big story of the day on this corner of the internet is a super-wealthy entity (belatedly) putting in place sensible processes that'll benefit the organisation and help it to thrive amongst rivals. So there's hope for Twitter yet.

But, seriously though, what does this mean for Manchester United? What does it mean for a club with means (and now the inclination) to build out an analytics department in the year 2022?

Many blue moons ago this newsletter wrote about City Football Group's own data department expansion. It was a jaunt through the job adverts they posted, things like "developing our Narrow AI platform" and "advancing our computer vision, simulation & reinforcement learning environments". Sadly, Manchester United's advert, which is here, is not so exciting.

They're hiring for machine learning scientists and decision scientists. To quote the ad:

"ML [machine learning] Scientists will work primarily on human-out-of-the-loop algorithms applying Data Science techniques to augment or summarise data at scale to solve significant, long-term problems.  Decision Scientists will work primarily to improve human-in-the-loop operational processes, collaborating closely with our football experts across the club to integrate Data Science into their daily work.  We welcome applications from candidates who have ambition to work in either area."

(For posterity and for people starting out in and around the industry, I'll post the person specification at the end of this post)

Beyond that, there aren't a lot of juicy specifics. However, as the Training Ground Guru site quoted, United's director of data science and the person doing this hiring, Dominic Jordan, said earlier this year that "the club is very much looking to be dominant in this space."

So, two years on from one Manchester team rapidly expanding their data operation, with other teams across Europe improving theirs in the meantime, let's consider the question: what would it mean to be dominant in football data science?

Let's assume that 'dominant' doesn't mean United's algorithms putting Liverpool's algorithms in headlocks and giving them a noogie. (That's for after-hours at analytics conferences). There are a number of different avenues that you could go down with a data department at your disposal, let's list as many as possible:

  • Identifying general undervalued (or overvalued) areas in the transfer market
  • Looking for players who fit specific role-based requirements for the first team
  • Researching and monitoring effects of match and training load on player fitness
  • Patching into Twitter's API to find Pep Guardiola's burner account
  • Developing 'fundamental' models, like expected possession value, and running experiments to try and work out advantageous tactical approaches
  • Goalkeepers. Nobody outside of about three people knows anything about goalkeepers.
  • How to turn as many academy kids into first-team players or valuable sales as possible
  • Optimal financial engineering (pay the players as little as possible, as much as needed)
  • Optimal C-suite engineering (retain as few of your bosses as possible, as many as needed)
  • Run and analyse medical/fitness experiments using the women's team and academy - so little research has been done on elite female athletes, building a large knowledge base could be a big win. 'Just' getting ACL injury probability down to male footballer rates would be neat.
  • Advanced research on new data sources, like 'skeletal' data
  • Find the usernames of all your rivals' data employees and grind their Elo into the dust, torpedo-ing their productivity as they spend their time memorising endgame strategy
  • Building out scouting KPIs or filters or models based on (video footage-based) tracking data, in addition to the ones that presumably exist based on stats or event data
  • Work with coaches to create lightning-fast, user-friendly apps to access in-game analytics-based strategy
  • Subscribe to Get Goalside
  • Use your username list to organise a Magic: The Gathering tournament. Invite Pep Guardiola. I'm sure he'd have a blast.  

A small-scale interesting thing with Manchester United when contrasted with City Football Group is that United is 'just' one club. If you're a data scientist on the Trafford side of town you won't need to care whether free-flowing attack is a path to success in the Australian A-League; if you work for the light-blue team, who also own Melbourne City, you might do.

That's a different set of incentives, a different way that resources can be distributed. Frankly, with that long a list and a full starting XI's worth of clubs to be involved in, it's a wonder that CFG's department get anything done at all. Although you'd hope that, in the spirit of institutional transparency, they've already crossed Guardiola's burner off the list.

But, when you're building out a department, what do you focus on first? What are the first projects that get done? It's not like United will be starting from nothing: there have been data professionals working on the football side for a while; there's a Data Operations department that's referenced in the job ad, who presumably mean you don't have to do a lot of infrastructure work; the director of data science himself has been in the role for a number of months (albeit a number of months in which time the club changed men's team manager).

You'd probably want to be hiring some people with the capabilities of doing all the kinds of things that the Get Goalside newsletter gets excited by, even if you don't plan on doing it immediately. Alternatively, maybe that's who and what you'd focus on most. Maybe your interpretation of "dominant in this space" is being the first(?) club to squeeze precious lemonade out of skeletal data lemons.

This would be fun for two reasons. One, learning is fun. But even moreso: Two, getting one over on your rivals.

At the 2021 StatsBomb conference, Liverpool FC's director of research said, as cited by the Liverpool Echo, "Recruitment is the most important application of analytics." The fancy skeletal data is unlikely to be useful for that, but - like with pitching, batting, or bowling mechanics in baseball and cricket - it could be a treasure trove for technique improvement. Ajax used something like it for goalkeeper technique analysis. Use the fun, new, shiny data! Prove Liverpool wrong! What was it Sir Alex Ferguson said about that club and knocking off of perches? 👀

But perhaps you wouldn't want to go in that direction. Analytics types tend towards the genteel.

Perhaps, instead, you want to focus more on the here and now. Or, if you really want to make sure the men's team nails the next couple of transfer windows, the here, hopefully-not-here-for-long, and almost-now. A different focus, a different set of priorities.

You could get deep into predictions about how transfers might adjust to new leagues (i.e. the Premier League) and how certain positions (i.e. centre-forwards and central midfielders) might interact with current tactics. You could go deep into the holes in Chelsea, Arsenal, and Man City's women's teams to try and secure United's first Women's Super League title. The job ad does say that decision scientists will be "collaborating closely with our football experts across the club to integrate Data Science into their daily work."

I don't have any real insight into this. It's a chilly, rainy evening and I figured that this was a fun starting point to talk about the variety of things that new departments can do. For a professional's viewpoint, Inter Miami's director of analytics Sam Gregory spoke at this year's StatsBomb conference about building an analytics department. Coincidentally, Inter Miami's coach is ex-United player Phil Neville; Liverpool are used as a hypothetical example club in Gregory's presentation. Perch. Perched on.

We'll see what road United decide to go down. Well, let's be honest, we probably won't see. We'll probably be able to see who gets hired, and guess at what direction that means things are pointing, but the track record for clubs talking about what they do or it being reported on isn't strong.

In fact, this is a final way that you could be dominant in football data science. Publish a load of work. Tell us what you're doing. United's football heritage already involves long ball dossiers in press conferences (strangely, Louis van Gaal's hair-pulling quote has lasted longer in history's memory). Why stop there though? Erik ten Hag and Marc Skinner could deflect difficult questions by pointing to Appendix B of the latest research paper release. That'd show CFG and Liverpool, eh.

As promised...

'The Person' section of Manchester United's data scientist job advert is below (a link to the job ad is here, closing date is 18 November). Football analytics-related job adverts tend to be either pretty fuzzy or pretty specific and demanding; this is probably the most similar I've seen to a 'regular' data scientist job posting. Given that a lot of people early in their careers are interested in this kind of thing, this might be a useful list to take a look at even if you don't plan on applying: working towards it would probably serve you well in a lot of workplaces.

While learning is part of a Data Scientist’s day job at Manchester United, we would love you to be able to bring:
Excellent mathematical and statistical knowledge, gained from a degree in a quantitative discipline or equivalent courses, or demonstrable practical equivalent [Get Goalside's emphasis]

Excellent Python skills

Excellent general Data Science skills covering development of practical analytics applications to enhance established processes, data story-telling and KPI development

Experience in applying data science techniques to answer sports-related questions

Good understanding of software engineering principles, including test-driven development, CI/CD and version control.

In an ideal world you would also have:
Excellent SQL or similar data manipulation skills

Understanding of Bayesian and causal inference

Knowledge of cloud infrastructure and experience of working with data at scale

Experience of conducting code reviews

Experience collaborating with third parties, including academia, to solve problems collaboratively

Thanks for reading.