Analytics Mailbag!

Hello reader,

For this edition of Get Goalside we're going for the trusty Q&A mailbag. I put out a call for questions, I got some questions, I answer the questions below. Some of the questions have been shortened a little for conciseness. The answers, slightly less so.

There are questions (in this order) on: evaluating centre-backs, goalkeeper stats, variance, crediting pass receivers, and Peter Crouch. Yes, a question about Peter Crouch.

Evaluating centre-backs

Zachary Washburn asks: "Has there been an advancement in evaluating center backs or has the industry still not made much progress in that regard?"

My favourite question.

First, a bit of theory. I think that evaluating centre-backs is so difficult because it's an area that's so dependent on not just the management of space but also on a lot of 'what if's. It's kind of inherently reactive, to opponents taking up good positions or making good runs, and to teammates of those opponents finding them. And even in situations where something dangerous could happen, it often doesn't.

Now, the answer. The most clearly worthwhile stuff that I can think of is simulation-related. Five years ago, STATS (now part of Stats Perform) wrote a paper on ghosting (no, not the stopping-talking-to-someone-with-no-explanation thing). They basically took tracking data and simulated what could happen in the next few seconds of play, and they could do this as if the simulation were league average players or specific teams.

With this, you could look at a central defender's tracking data in a given situation and compare it to what the ghosting model thinks, say, Virgil van Dijk would do.

There was also a presentation at last year's Stats Perform Pro Forum from Aditya Kothari that essentially simulated pitch control values if a defender was absent.

This all seems pretty computationally heavy, but it's a good start. (I still think that having a better theoretical idea of what good defending is will help the measurement of it, and that football analytics lucked into 'strikers just take shots, and shots are good' part of the sport, but we live and learn).


@Tiotal Football asks: "Seems like there’s just a ton of decent data on goalkeepers now. And while there are always some question marks, it feels much more straightforward than all other football analytics use cases for recruiting… so everyone should be doing analytics signings at GK now, right? Or no?"

An interesting question. I agree that, provided you have the right data, scouting goalkeepers seems more straightforward than other positions. Goalkeepers, no offence, basically have one job, and it's far easier to tell whether they've prevented a goal than it is for centre-backs.

However, I'm not sure I'm as sure that there's a ton of decent data on them now. The best stuff still seems to involve people collecting their own data (e.g. John Harrison). StatsBomb's 360 data probably offers a lot of scope for improving data work though: if you have freeze-frames for when a cross or through-ball is made then you'll have a much better idea of whether a 'keeper should have come for it or stayed at home.

The other grain of sand in my shoe with data in transfer processes is that it's just what a player has done, not necessarily what they can do (see: Aaron Ramsdale's distribution). Might this effect be amplified with goalkeepers, where samples are so small because of lack of involvement?

Ultimately: yeah, I basically agree, but maybe 20% less strongly.

The VAR word... VARiance

Steffen Barthel asks: "I've been thinking about the repeatability of statistical outlier events (e.g. shots with very high xG) and was wondering whether you would get more robust and predictive stats by leaving out big outliers. Phrased differently: should the variance of a stat be considered when evaluating players and whether they can sustain their performance?"

The short response is that I don't know the answer to this. The repeatability of outliers is presumably simple enough to check, with enough data and time; and the matter of how this affects analysis will follow on from that.

There was some recent work on a slightly related subject by Michael Caley, focused around the extent that the first 12 games of the season predict the remaining matches (prompted by Arsenal being topical). It's behind a Patreon paywall but this seems a useful part to extract:

Overall, this [his results] suggests that removing outliers isn't an obviously terrible and bad thing to do, but at the same time it doesn't appear to be a particularly useful thing to do. At minimum, you should have a good reason to remove outliers beyond the fact that outliers exist.

That was on teams, rather than players, and variance between match-level totals, rather than variance between xG or xT values for individual actions. However, I tend to have the same assumptions when thinking about anything in this family of questions:

  • Removing outliers probably makes sense - my reasoning being the weird nature of the penalty area where shots can become incredibly good value in circumstances that aren't that different to mediocre chances.
  • I'll change my mind if I see evidence though. Maybe large enough samples just mean outliers aren't that meaningful. Maybe outliers are indicative of a general trend.
  • To what extent are tactical features at play? A player who gets on the end of tap-ins is probably good at reading play, but if they play on a team who creates a lot of those opportunities does looking at their xG, with all those mammoth xG tap-in chances, give a false picture of their ability?

Receiving passes and crediting passers

Steffen Barthel also asks: "What do you think is the best way to split the value of a pass to passer and receiver? I personally like ASA [American Soccer Analysis]'s approach of considering the pass probability."

This is a great question (inspired by John Muller's recent piece in The Athletic).

ASA's method, in their Goals Added (g+) model, is to split the credit depending on the expected pass completion percentage of the pass in question. To quote them: "If a pass has an xPass% of 20%, the passer gets 20% of the credit, the receiver gets 80% of the credit."

I'm not entirely sure whether I like this division, but I think I prefer it to splitting it 50/50, or wholly crediting the passer, which I think are the two fall-back defaults.

What I like about this division is that it feels right to me that there are different situations where the passer and receiver deserve more or less credit. What I'm less sure of is that if a pass is more difficult, the receiver deserves more credit for it being completed. This seems sensible for long balls, but Pogba-esque diagonal through-balls?

Perhaps some combination of pass difficulty and pass uniqueness could be used to divide the credit.

Peter Crouch

Joshua Gerald Butler asks: "One of the first players I became enamoured with was Peter Crouch, but that was way before I became interested in data and truly understanding football. I want to know, how good was he from a data/analytics perspective? Is my fondness for him based on pure nostalgia, or was he actually a good player?"

This is a fun question (and reminds me of this tweet from a couple of weeks ago). Unfortunately there's not a ton of available data out there. WhoScored has data some back to the 2009/10 season, Crouch's first at Tottenham Hotspur after notable spells at Southampton, Liverpool, and Portsmouth. FBref has basic goals, assists, minutes data for his entire career.

Just on those latter basic stats, you can see why Liverpool bought him from Southampton in 2005 (there's a story we've heard before). He'd scored 12 (one penalty) and assisted six in the equivalent of 20 games, an average of 0.87 non-penalty goals+assists per 90. Only two players in the Premier League are beating that rate this season, Paul Pogba and Mohamed Salah.

But apart from his second season at Liverpool (9 goals, 6 assists, equivalent of 17 games, 0.9 goals+assists per 90) Crouch never quite hit those heights again. He was basically a 'one in two' striker, if you count assists as well as goals. He was also rarely a full-time starter, only starting 19 or more games in eight of his 18 Premier League seasons.

In the stuff that's on WhoScored for the back-end of Crouch's career, he tended to average between 2 and 3 shots per game, and between 1.0 and 1.5 chances created. That basically mirrors his goal record: 105 goals, 59 assists.

Maybe Crouch was just a lanky, poor man's Didier Drogba. Drogba spent half as many seasons as Crouch in the Premier League and finished with remarkably similar figures: 104 goals, 55 assists. Big men who played their teammates in far more than given credit for.

Thing I Learned: Peter Crouch has more Premier League assists than Paul Scholes

Thanks for reading!