Research in Focus: DeepMind's fill-in-the-gaps tracking data

'Multiagent off‑screen behavior prediction in football', 19 co-authors from Google DeepMind and Liverpool FC (full list at end of piece)

[Paper is available in full here]

Why it's worth your time

A Google DeepMind team getting involved in football analytics is definitely worth paying attention to, particularly when accompanied by Liverpool's Ian Graham and William Spearman.

But in terms of content, this paper tackles a question that could be hugely important to analytics departments in coming years: can you get 'full' tracking data from TV broadcast footage-derived tracking data (where the full pitch isn't visible)? And, from that, can you implement a pitch control model only using broadcast tracking data?

What it says

Broadly speaking, yes you can.

This is a highly technical paper, so I can only offer a broad sketch, but a point the paper makes repeatedly is that their 'Graph Imputer' model uses data from both before and after the moment where a player goes 'off-screen'.

The method also tries to take account of the fact that players' movements will interact with each other to some extent. This means that you can kind of 'share information' between these individual nodes that you're modelling.

There's a stack of alternative methods that the paper compares their own model to. They evaluate it by essentiall creating fake broadcast tracking data from full-pitch tracking data, imagining that only a certain portion of the pitch was visible.

They could then compare the 'off-screen' trajectories that their model predicted to the actual trajectories in the data, as well as comparing a pitch control model based on their imputed data to one based on the 'true' data.

If you want a blast from the paper itself:

"Our overall approach autoregressively estimates agent states, using available information in both directions. The model is inherently designed to handle noisy data through two means. First, the bidirectional nature of the model helps ensure it uses information available in future timesteps to correct for such noise. Second, the model is designed to handle noisy data due to its variational nature; namely, the model itself generates noisy autoregressive predictions during its imputation phase, which capture the distribution over input noise, and can thus lead to generation of diverse samples of trajectory outputs."

Also, a they created a website with some examples.

What's cool about it

Pitch control models rely on knowing where all players on the field are, which causes trouble for tracking data derived from broadcast footage, which won't include all players on-screen.

'Broadcast' tracking data is much more readily-available than full tracking data though. If you can supercharge that easier-to-obtain data, making it more like the full tracking data, then doors open for you. Options for tracking data-based scouting or opposition analysis will open up.

Although it's pitch control that's used as an evaluation tool and use-case in this paper, it also raises the possibility of collecting more accurate physical metrics from broadcast tracking data too. I think that broadcast tracking companies say that most relevant information (e.g. sprints etc) happen on-screen, but a little extra data isn't going to hurt.


Full list of co-authors (Google DeepMind unless stated): Shayegan Omidshafiei, Daniel Hennes, Marta Garnelo, Zhe Wang, Adria Recasens, Eugene Tarassov, Yi Yang, Romuald Elie, Jerome T. Connor, Paul Muller, Natalie Mackraz, Kris Cao, Pol Moreno, Pablo Sprechmann, Demis Hassabis, Ian Graham (LFC), William Spearman (LFC), Nicolas Heess & Karl Tuyls

[Paper is available in full here]


'Research in Focus' is like SparkNotes for football analytics: summarising and analysing the best research out there. Get Goalside supporters get access to every post, with a rotating selection free to access for all. Follow this link for the list of all Research in Focus pieces.