How can tracking data help defending analytics?

What would a tracking data approach to defending look like?

I’m writing this fresh off (half)watching Arsenal’s Champions League quarter-final defeat to PSG, and Leah Williamson was caught out for both goals. In the first, she was an odd distance away from her marker at a corner, meaning it was always likely that Katoto would get space away from her:

Having trouble viewing this tweet? Click here to open original tweet.

And then in the second, Signe Bruun nicked in front of her to score. The ‘stretching forward from a flat-footed, falling backward stance’, I think, says a lot about Williamson’s defending on this:

Having trouble viewing this tweet? Click here to open original tweet.

I think we generally assume that, because tracking data tracks every player’s movement, this kind of thing will be nicely measurable with it. But what would that actually look like if you had to present some findings to someone?

Both cases are examples of forwards ‘getting a yard on’ the defender, which is usually a symptom of a lack of awareness from said defender, or of them not having anticipated that a forward might be making a run around them.

It might actually be worth unpacking that latter thing a little, because it’s a sub-or-semi conscious process that is doing a tremendous amount of work. If you’re a defender in your own box, and there’s a striker just off your shoulder, and the ball is in a position where the opponents could cross it, you need to be predicting what that forward will do next.

You need to base it on where they were when you last looked at them and on the situation the ball is in. For example, if the striker’s drifted a yard or so away from you and the opponent on the ball has driven to the byline and looks like they’re going to drive it low, you should be anticipating a run across you. But if the striker’s in the same position and the opponent on the ball is having to put in more of an arced back-post cross then you should be ready to backpedal.

Various things can go wrong for the defender. Not scanning around them enough means they lose track of the forward and therefore of what the forward is likely to do. They’re left purely to react to the flight of whatever pass comes in. They could also mis-read the situation on the ball, meaning that while they know where the striker is they could be slow to react to their run simply because it takes them longer to process what delivery is coming in.

Anyway. What does this have to do with tracking data?

You might be able to use it to measure a defender’s reaction time to a cross, although that presupposes that they need to move in order to deal with it. If there’s a delivery straight to them and they don’t need to move positionally then the tracking data’s unlikely to pick up how well they reacted to it.

There’s also the option of looking at how players react to runs across them. For Williamson’s second goal, it seems like she wasn’t aware that Bruun would be running across her, and it might be possible to see, with the tracking data, if that was a consistent trend.

However, there are times when forwards run across defenders and the defender is right to leave the run. The defender might have accurately read the body positioning of the player on the ball to foresee that they would play a cut-back to the edge of the 18-yard box, in which case said defender should switch their focus to blocking a shot from that area rather than a shot from the striker running across them eight yards from goal.

There’s been talk and research on determining zones that players are responsible for, and then looking at what happens within those zones, has been done before. Thom Lawrence sparked an analytics love affair with polygons with PATCH, and the Barcelona Innovation Hub have done a bit with tracking data in section 4 of their paper here.

I’d be interested to know how the latter method deals with instances defending deep in the box, where things are a lot messier and the method of determining an average position over the past two minutes might not work (although I also may not understand the process).

One big positive of the Barça method is that it uses that ‘past two minutes’ to determine an expected position of players, and builds the ‘zones of responsibility’ around these expected positions. See below image from the paper — the key here is that, for example, the No.6 isn’t designated a larger space of responsibility because of the change in positioning of No.8.

Caption — Screenshot from the paper, with players' assigned defensive zones and their positions in relation to them

The reason why this could be an advantage is that a player’s zone of responsibility doesn’t increase if a teammate is out of position. (The potential downside is that if a teammate is out of position and they move across to cover, then they might not be guarding their own zone of responsibility for reasons that a model might not pick up).

To return to Williamson, would a model like this help determine whether she has a problem with strikers making runs around her in the box?

Maybe. It seems promising. You could count all the instances where a striker gets across her and gets a touch on the ball. But for the ‘accurately reading a deep cut-back’ reason I mentioned before, it might not be able to pick up instances where a striker got a run and didn’t touch the ball without a handful of false positives.

Football is hard. And this is only one part of defending. Football is very hard.

Shout-out corner

This piece, from Charles William, is neat on using tracking data and the concept of pressure and creating something that seems actionable. (As someone who writes a newsletter that is very often “here are some ~thoughts~” I feel I should always credit people who make things that are actually useful).

Also a general shout-out to the women’s Champions League. The semi-finals take place on Tuesday (Wolfsburg vs Barcelona) and Wednesday (Lyon vs PSG) with the final on Sunday 30th. It’s good stuff.