When I was -- at a guess -- seven years old, there was a kid's TV programme that tried to make science cool. One episode they stretched out, to full length, some fake intestines. In another, they demonstrated the extreme knowledge of physics that David Beckham must have to be able to bend the ball up and down and swerving over a wall like he did. Rubbish, I thought, it's practice not physics. Stop trying to crowbar science into my football.
Over a decade later, here I am writing about science in football. And not just any science, damned physics. If the mid-2010s of football analytics were about mathematical modelling, the mid-2020s are going to be much more about physics-based models. It all started in 2016, at the OptaPro analytics forum, with a model called 'pitch control'. It would soon become an underpinning feature of a lot of other exciting work, and this newsletter is going to tell you all about it.
The 'pitch control field' was introduced by William Spearman, then of Hudl, and it does what it sounds like -- measures who has control over areas of the pitch. The result looks like abstract art, with each team 'painting' areas of the pitch where they have the best access to.
This is all physics. Using tracking data (which uses cameras to see where every player is multiple times per second) you account for player position, current direction and speed. Factor in some general acceleration and top speeds that players can reach, chuck it all in a model, then you get your results.
With this being the fast-paced world of football analytics, things would soon change.
Two years later, in 2018, the MIT Sloan Sports Analytics Conference featured not just one but two research papers that included pitch control modelling. One came from FC Barcelona's Javier Fernández alongside then-Sacramento Kings VP of Strategy and Analytics Luke Bornn. The other was by Spearman himself.
Fernández and Bornn's pitch control method took into account player location and velocity, conceptually quite similar to the model that Spearman had presented in 2016. There are some differences on the computational power and data requirements of the two 2018 models though, with Fernández and Bornn's aiming to be a little lighter.
Things diverge even more from there.
Both 2018 papers realised that the 'basic' pitch control model that had debuted two years earlier could be improved on. For example, that original pitch control field had been based on the hypothetical idea of a football placed at every location on the pitch and who would get there first. This method also gave huge amounts of 'control' to goalkeepers, which might be factually accurate but not exactly useful. Each paper tackled this in different ways.
Spearman adapted Hudl's pre-existing pitch control model into a 'potential pitch control' model. The probability of who controlled what space was now affected by things like how long it would take the ball to get from its current location to all of those other locations on the field. A lot of this built on work that he and the team at Hudl had done a year earlier, on a physics-based approach to the likelihood of passes being intercepted.
Fernández and Bornn's approach, meanwhile, was a two-layered one, and is the point at which we diverge from pitch control as a model in and of itself.
They first started with the assumption that the defensive team's positioning is indicative of the most valuable spaces on the field at that particular moment in time.
However, it's also clear that that isn't quite enough. In the first visualisation of the above image, with a ball placed inside the left-hand penalty area, the position of maximum value is within the left-hand half of the pitch, quite far from the goal that is being hypothetically defended on the right-hand side.
The second way that Fernández and Bornn overcame this 'relevancy' problem of the basic pitch control calculation was by adding a layer to take into account the distance to goal.
In the rest of the 2018 paper, Fernández and Bornn use this 'pitch value' to look at who occupies these valuable spaces that their model can identify. There's a good article by Bobby Gardiner on FiveThirtyEight about some of the applications and findings of the paper here.
Spearman also added a couple of extra layers to his (potential) pitch control model to look at the value of areas on the pitch. The approach he took was pretty different though. While one of his additional layers focused on scoring probability from areas of the pitch (and so looking similar to Fernández and Bornn's 'distance from goal' layer), the other was about where the ball was likely to go next.
While Fernández and Bornn looked at a 'purer' kind of pitch value, Spearman's model was about -- as he dubbed it -- off-ball scoring opportunity. A subtle but significant difference.
Enjoying this? Sign-up for the newsletter
From this point onwards, the work has been more about the 'value' side than the control. Fernández and Bornn teamed up again the following year (along with Dan Cervone, then of MLB's Los Angeles Dodgers) to produce a framework for 'expected possession value'. Things didn't stop there by any means, but it's just an example of how things keep moving.
The full extent of 'what came next' is a subject for another newsletter, but before this one ends we should take note of some of what came before Spearman's 2016 OptaPro forum presentation.
From 1996 to 2000, there were a series of influential papers by reseachers Tsuyoshi Taki and Jun-ichi Hasegawa. The papers (cited in several of the works I've already mentioned) are astonishing for their breadth, encompassing the development of the tracking technology needed to collect the data as well as the methods to analyse it. All this around a decade and a half before the OptaPro analytics forums even started (in 2012).
They too were building on existing knowledge about applying 'Voronoi tessellation' (by drawing lines marking the boundary exactly between data points) to get a sense of 'control' of an area. This, as Taki and Hasegawa pointed out, assumes each player would take the same time to get to this 'boundary line', when in reality players are moving at different speeds starting in different directions. Just like expected goals, pitch control has a much longer history than you might think.
 || "In 2016, the OptaPro analytics forum featured a presentation that that would mark a new mini-era in football analytics." || Slides of the presentation are available here, but the video of the presentation appears to have been taken offline by Opta.
 || "...it weighs up the probability of each team controlling the ball if it was in that location" || As Spearman's 2016 OptaPro forum presentation only has slides available currently, there isn't much detail on how this formulation is done. However, for a detailed comparison of different methods, see Ulf Brefeld, Jan Lasek, Sebastian Mair, 'Probabilistic Movement Models and Zones of Control', copy of the paper available here
 || "...alongside then-Sacramento Kings VP of Strategy and Analytics Luke Bornn" || Bornn was also an assistant statistics professor at Simon Fraser University at the time, and had been Head of Analytics at Roma for a season prior to switching to the NBA and joining the Kings.
 || "One [paper] came from[...]Fernández [...and...] Bornn..." || Javier Fernández and Luke Bornn, 'Wide Open Spaces: A statistical technique for measuring space creation in professional soccer", MIT Sloan Sports Analytics Conference 2018; copy of the paper available here -- for pitch control method, pp. 3-6.
 || "Two years later, Spearman's paper at the MIT Sloan Analytics Forum..." || William Spearman, 'Beyond Expected Goals', MIT Sloan Sports Analytics Conference 2018; copy of the paper available here -- for pitch control method, pp. 3-7
There's also a video Spearman did for Friends of Tracking in 2020 in which he talks about the pitch control model and the things this paper introduces [link should go to the correct time; if not, the relevant section starts around 8:35]
 || "...on a physics-based approach to the likelihood of passes being intercepted" || William Spearman, Austin Basye, Greg Dick, Ryan Hotovy, Paul Pop, 'Physics-Based Modelling of Pass Probabilities in Soccer', MIT Sloan Sports Analytics Conference 2017; copy of the paper available here.
 || "They [Fernández and Bornn] started with the assumption the position of the defensive team is indicative of the most valuable spaces on the field at that particular moment in time." || 'Wide Open Spaces', pp. 7-8.
 || "While one of his additional layers focused on scoring probability from areas of the pitch [...] the other was about where the ball was likely to go next." || 'Beyond Expected Goals', pp. 7-10.
 || "...to produce a framework for 'expected possession value'."" || Javier Fernández, Luke Bornn, Dan Cervone, 'Decomposing the Immeasurable Sport: A deep learning expected possession value framework for soccer', MIT Sloan Sports Analytics Conference 2019; copy of the paper available here.
 || "...a series of influential papers by reseachers Tsuyoshi Taki and Jun-ichi Hasegawa." || There are numerous papers that are relatively similar and build incrementally on each other, and I can't find links for all of them, but I think any and all are worth looking at if you're able. They are:
- (1996) 'Development of Motion Analysis System for Quantitative Evaluation of Teamwork in Soccer Games', Taki, Hasegawa, and Teruo Fukumura, Proceedings of 3rd IEEE International Conference on Image Processing [link to IEEE page for the paper here]
- (1998) 'Dominant region: a basic feature for group motion analysis and its application to teamwork evaluation in soccer games', Taki and Hasegawa, Proceedings of the Society of Photo-Optical Instrumentation Engineers [link to the SPIE page for the paper here]
(2000) 'Visualization of dominant region in team games and its application to teamwork analysis', Taki and Hasegawa, Proceedings Computer Graphics International [link to IEEE page for the paper here]