Supporter special: 360 Review

A special for Get Goalside supporters on the labelling project for non-player-labelled StatsBomb 360 data

Hello supporter, thanks for supporting. I thought I'd write a bit about the newsletter from earlier this week on labelling out-of-possession position data on StatsBomb 360 frames. I'm gonna work on the assumption for this post that you've already read it, and proceed to blog on.

I was kind of surprised it worked as well as it did first try. Now, maybe 'as well as it did' is not actually that well, considering...

StatsBomb 360 frame plotted on a pitch. Red polygon shows a viewable area of about a seventh of the field, a segment of the final third not quite the full width of the pitch. A centre-forward and left winger look correctly labelled but both centre-backs labels have been assigned when the actual centre-backs are almost certainly out of frame

Many of the frames that I checked were like this: a couple of plausible labels but then with some pretty noticeable almost-certain errors thrown in too.

I didn't check a ton of frames (things were looking good enough on an individual level, and definitely on an averaged level, for a simple newsletter) but mislabelling of midfielders as defenders or vice versa seemed a frequent culprit. This is understandable given that I was basically just throwing angles together, and many of the angles of a centre-back in relation to teammates are similar to a central midfielder in relation to teammates. You've got somebody either side, someone at angle(s) in front, someone at angle(s) behind...

There's no proximity information in the way I wrote this code, which I think there will have been in the work from Shaw and Glickman that I was drawing from. I cut a corner there, not wanting to work out how to translate my fake 'formation template data', which I'd done on a 1-7 grid system, into something that'd work with StatsBomb's 120x80 coordinates system. It wouldn't have been difficult, but I dunno if it would've fixed the problem.

Instead, as a really basic addition, I toyed with the idea of adding some kind of 'distance from goal' feature. The idea was that if a team was further from goal then the visible players would be more likely, relatively speaking, to be midfielders than defenders. However, I ended up feeling that it wasn't going to be a good ease-vs-robustness trade-off compared to both what I had already and just solving the problem in the previous paragraph.

Now, the goalkeeper issue:

The average out-of-possession positions for Germany plotted, with their goalkeeper approximately 50 yards away from the rest of the team, who are grouped near the halfway line

Whether you care about the goalkeeper depends what you'd want to use this for I guess. It strikes me that the primary value in accurate goalkeeper positioning info here would be in gauging the success of 'passes in behind' strategies? If the average position of the keeper is low relative to the defence then maybe you've got a better shot.

This is probably a systemic flaw in the idea of using 360 data for this, but it might be improved if I hadn't been limiting the data in the way I was. To keep things relatively clean and simple I was only looking at teams' passes and ball receipts, and only ever factoring in positional data of the out-of-possession team. A couple of things seem like they could plausibly improve the goalkeeper location, even without estimating out-of-frame position.

1) Add in clearances/other goalkeeper actions. This might leave them rooted to their box, but maybe they'd make actions further out too

2) Count the start of an incomplete pass as an 'out of possession' action. This would likely capture more frames with the goalkeeper in view when they, or defenders, are lumping the ball forwards. Again though, maybe this would simply root them near their own box even more.

The matter of whether this kind of label data is actually useful is still something tumbling over itself in the back of my mind. I don't know how widespread StatsBomb's 360 sales are, but I imagine that there's something of a crossover between leagues with lots of likely buyers and leagues with tracking data deals. If you've got tracking data, why would you use this? And if you didn't, does it add enough, at a high enough level of accuracy, to be worth trusting?

That being said, I used the 'low-fat chocolate in high-fat chocolate shell' story as bookends for a reason. Slightly lower fat chocolate at somewhat similar satisfaction levels probably isn't going to make a huge difference to anyone's lives, buuut scientists understanding food well enough to potentially create less-unhealthy luxuries seems a pretty good path of progress. And the StatsBomb Twitter account retweeted the post, so I guess it was good enough that they didn't mind being likened to low-fat sweets.

Lemme know if you've got any thoughts. Either way, enjoy the rest of your weekend.