Conference season, autumn 2021 - The What & the How

We know what players do, but we don't know how they do it

-- paraphrasing Vosse de Boode, Ajax's Head of Sports Science, speaking at StatsBomb Conference in 2021

It's the most wonderful time of the year
With the nerds all assembling,
The Voronois tending,
To show up on screen...
It's the most wonderful time of the year

-- paraphrasing Andy Williams, American pop singer, singing "It's the Most Wonderful Time of the Year" in 1963


Yes, it's analytics conference season. The New England Symposium on Statistics in Sports (aka NESSIS) is running online every Friday throughout October, and the StatsBomb conference took place in-person this past Friday as well. There's too much to take each conference talk-by-talk, but keep an eye on the respective YouTube channels (NESSIS, StatsBomb) and programmes for the events are here (NESSIS, StatsBomb).

But if you take both together, there are some interesting trends on view of where football analytics is heading...

As the De Boode quote (/paraphrase, I can't remember her exact line) at the start of this newsletter says, analytics is fairly adept at answering the 'what' questions in football now, but less so at the 'how'. In her talk at the StatsBomb conference this year she gave a good example of what that latter part can look like.

Answering the 'how'

When André Onana arrived at Ajax, he used a different stance to how Ajax goalkeepers were taught to set themselves in when preparing to make a save. Wider. The club could have just trained him out of it, thinking their way was best, but they just so happened to have a movement analysis lab and sports science department to hand.

De Boode and co tested reaction time and dive velocity across a number of their goalkeepers in different stances. They could use motion capture systems to get data on the angle of how the 'keepers stood and how they moved. And then they could analyse that data, and find that Onana was right after all. His wider stance, in certain circumstances at least, was linked to quicker reaction times and faster dive velocities.

Now, this is the most cutting of cutting edges in the field (and perhaps shows how Ajax keep producing such talented players). But although the presenters at these two conferences don't have what Ajax have at their disposal, their talks have also been dominated by questions that go beyond the 'what' and are getting closer to the 'how'. Principally, by investigating player decision-making.

It's all very well to know that players this many shots or make that many passes, but how often do they have those opportunities and choose not to? A goalkeeper may make a number of saves, but how do they manage to pull it off?

Across the 12 talks on the 'research' track of StatsBomb's conference and the first week of NESSIS, at least six of them featured a look at 'decision-making' in some form.

Three StatsBomb research talks mentioned it explicity: 'What drives the goalkeepers' decisions?' from Samer Fatayri, Kirill Serykh, and Egor Gumin; 'The quest for the right pass: Decision making' from Javier M. Buldú and Borja Burriel; and 'Turning with the ball & decision-making under pressure' from Soumyajit Bose and Manas Sarawat. Also of note is the fact all of these focus on a different area of the game.

As well as that, Hadi Sotudeh's talk 'Potential Penetrative Pass' contributed to the theme, developing a framework to identify when a player could have made a penetrative pass but either opted not to or didn't see the option.

And on top of that, two NESSIS talks each brought a slightly different flavour. Craig Fernandes looked at 'Untangling the Relationship between Intention vs Execution' -- but in tennis, rather than football. That said, there's a lot of knowledge and idea transfer that takes place between sports, so maybe someone in the non-racquet sport will take some inspiration from it in the future.

Also at NESSIS was Sam Gregory presenting 'Pace and Power: Removing Unconscious Bias from Soccer Broadcasts'. Here the focus wasn't on what players decided to do, but how viewers decide (unconsciously) to talk about women's football vs men's football and black footballers compared to white footballers.

Although not the application in the talk, you could see a use where scouts could be tested for their biases -- not necessarily even along race grounds, but on anything. The work Gregory presented (which was done alongside a team at SportLogiq and Toronto FC's Devin Pleuler back in 2019) turned video footage into anonymised 3D animations. Maybe scouts are more sympathetic to players they know vs those that don't, or to certain types of athletic build. Perhaps, in time, this type of process could be part of a scouting training course.

Furthering the 'what'

All of that being said, we're not all Ajax, and there's still a lot of 'what' to figure out. Another big theme of the conferences was widening the knowledge there through more specific questions. Defending, goalkeeping, phases of play: all are things that haven't been particularly well-covered by event data analysis so far, and all are things that StatsBomb research talks focused on.

A sizeable part of this was making use of StatsBomb's recently-unveiled '360' dataset [understandably, given it was their conference; for more on 360 data, read this]. Having a snapshot of where every player in camera-view is when every event is made is a bridge between the traditional 'event data' (what did player X do on the ball at location Y) and 'tracking data' (tracing the paths of all 22 players and the ball).

But some of it is just about asking more specific questions. Max Odenheimer and John Harrison scrutinised goalkeeping in an innovative way, while part of Will Morgan's talk looked at analysing defensive strategy while normalising for team strength and home advantage.

At NESSIS, Ethan Baron's talk -- 'Predictive Value of Off-target Shots in Soccer' -- was in a similarly inquisitive vein. Off-target shots are generally given a value of 0 in post-shot expected goals models because the went off-target. However, as Baron reasoned, knowing how close a player misses by might be able to tell you something useful about them. The metric created for the project, taking off-target shots into account, seemed stable year-to-year and a decent predictor of xG overperformance.

What's next?

Something striking about Baron's talk, Odenheimer and Harrison's, as well as the talk delivered by Maaike Van Roy from DTAI Sports Analytics Lab -- which split defensive set-ups into high/low, left-forcing/right-forcing blocks using 360 data -- is how they were splitting data in very football-specific ways.

Football analytics, as it has often been done, usually involves a lot of aggregation (combining all the data together). [I've lightly moaned about this in reference to 'finishing skill' before -- see here]. The advantage of this is that it increases your sample size in a sport where players don't play much and, outside passes, don't do things very frequently. The disadvantage is that you can mush together some situations that lose a lot of their meaning when they're mushed.

Now, you should always be aware of sample sizes. If I get seven heads from ten coin flips, that doesn't necessarily mean I'm an elite heads-coin-flipper (although I maintain that I am). But I think that you can be aware from this when interpreting figures that you get, rather than not making those sport-specific divisions in the first place.

While part of this is because the 'analytics community' as a whole has built up (much better) a base of ideas of what is useful for professional football, I also think it's worth looking in the other direction as well. "I want to find creative players" is something that a football professional could ask. But they could ask "I want to find players who pass the ball into dangerous areas starting from relatively non-threatening areas."

Both things -- data practitioners learning what questions to ask and football practitioners learning how to be specific with the questions they want answered -- have been important to this process. And it'll mean that, as time goes on, both the 'what' and the 'how' questions will be increasingly easy to answer (as long as, y'know, you have the relevant data and technical expertise).