Hello, and welcome to Get Goalside!. Take a break, you deserve it.
This week’s charity is, again, Refugee Action, who support refugees and people seeking asylum in the UK.
The idea for this post ‘Can you use data to predict the unpredictable’ came to mind before all hell broke loose over algorithmically-assisted exam results were given out in the UK. ‘Using data to predict the unpredictable’ is gonna have to do some work now to win back some trust. Hopefully I won’t have to do a U-turn.
Anyway, football. Every now and then you get reports about the intensive levels of work that people put in to make sure they’re ultra-prepared for matches. Matches which have, in the past, been decided by deflections off beach balls.
Now, it’s all well and good me saying that this is overkill, but considering the pressure that managers and staff are under it’s not surprising that they’ll spend hours scraping under rocks for the most obscure or unlikely of details.
Stats folks have often pointed out how data (and better tech) can speed up repetitive manual tasks. Pre-match, post-match, scouting reports are mostly done in templated formats, with a lot of topics that can be summarised with numbers. Even for things that can’t simply be summarised with numbers, it can often point a direction for coaches, scouts, and analysts to look.
But this is all based on what the data can see. Data is unlikely to have helped Leeds in their report on the third-choice ‘keeper who hadn’t played all season. Perhaps there’d be figures from a previous year, but there’s more likely to be some video knocking around from a friendly or reserve game or training. And if there’s one game of data versus one game of video, it’s the video you’ll want.
Similarly, if a manager wants to know every plan that an opponent could hypothetically throw at him, the traditional ways of using data may not help much. It’s one thing to say ‘this team usually plays in X formation through Y players’, but what if they, y’know, do something different.
That — and it proving to be a masterstroke — is every manager’s worst nightmare.
Sadly, I don’t have the time, data, or statistical knowledge for properly modelling this all out. But I do have ideas.
Using data to make predictions is nothing new, of course. Win prediction models are everywhere and, taking inspiration from Nate Silver, career progression models have been experimented with as well. With both, the point is to look at what’s come before and use it to map out what might happen based on similar circumstances.
With Nate Silver’s PELOTA (baseball) and CARMELO (basketball) career progression models, those ‘similar circumstances’ were players throughout history who had comparable career stats to current players; with expected goals it’s the mass of previous shots in history.
And so with teams… well, there aren’t as many teams as there are players or shots, but why limit yourself to the one team you’re coming up against when there are going to be others who play, or have played, similarly.
This thing about overkill analysis all started with Leeds, so let’s use their promotion rivals from this season, West Bromwich Albion, as a hypothetical.
Per Twenty3’s Content Toolbox, using Wyscout data, West Brom averaged 349 completed passes per game. That’s a similar ballpark to Derby, Swansea, and QPR in the Championship, but let’s expand the selection of leagues to the traditional European Big Five as well as the Dutch and Portuguese top tiers. That expands the number of teams in the range from three to 16.
That’s just completed passes, but if you were doing this seriously you’d want to use some different metrics. Pace of play, pressing style, how heliocentric a team is; these could all be used as markers of a team’s similarity.
You’d probably want to choose some data to chuck out as well: garbage time when a team’s winning or losing would be an obvious example, and then, perhaps counterintuitively, the games where they did things differently. To make sure that a team’s comparable to another, you want to make sure you’re looking at how the one team usually plays and how the other teams in the sample usually play.
And then, once you’ve done all that and established a pool of teams that are similar to the one you’re interested in, you can look at what all of them do when they ‘do something different’.
If you’re enjoying this, share and/or subscribe!
Now, there’s a chance that none of this’ll work.
The ‘different’ things that a team do will, at least partly, be based on the players that they have available. A possession-based team might play a tiki-taka style usually, but some might have a big ol’ target forward on the bench that others don’t.
That said, the make-up of a squad will probably inform the way a team plays and their primary back-up option. With the possible exception of Pep Guardiola, with enough data and enough smarts, I think this could be a way to help predict the unpredictable about how teams might play against you.
Bit pressed for time this week so I’ll link to Devin Pleuler’s great analytics handbook collection of resources and the Friends of Tracking YouTube channel, both of which are a treasure trove of things to help get a start with data analysis.
Also, not directly related to football (but is the kind of tech that could presumably be applied to football), this just made me go ‘woah’ this week:
This week’s charity is Refugee Action, please consider helping them.