Research in Focus: xG Replication with American Soccer Analysis

What happens when you test out a piece of work from 2015 with new data? The 'Research in Focus' series turns its attention to an article that tried just that, from American Soccer Analysis

'The Replication Project: Is xG the best predictor of future success', by Eliot McKinley (American Soccer Analysis website)

[Link to the article is here]

Why it's worth your time

The whole point of the 'Replication Project' is that it's worth 'replicating' old work to check it still holds up. In this case it goes back to try and repeat the work of a 2015 article, but with a different, new, set of data.

The article is also a good entry point into two very important things to know: why people use xG, and why MLS is weird.

What it says

Thankfully, this Replication Project post says that the main finding of the 2015 article, by Sander IJtsma, still holds up: the ratio of a team's expected goals to their opponents' is a better predictor of future goals and points than the ratio of shots, shots on target, goals, or points per game. The differences in xG's favour aren't quite as large as IJtsma's article found though.

The thing being predicted is the amount of goals/points over the rest of the league season, which is why, in this particular type of study, the correlations peak around midseason and then decline again - at the start and end of seasons you're dealing with smaller sample sizes, first looking backwards and then looking forwards.

But why is MLS weird, I hear you ask. In MLS, none of these metrics predicts points or goals very well. It's important to note, though, that the stats in this study are xG/shot ratios (e.g. Team A get 55% of the xG in their matches) - it's not looking at the link between total xG and total goals.

If you're interested in this MLS finding, the ASA team followed up with another blog all about parity and disparity.

What's cool about it

The finding about MLS is cool, and so is the Replication Project in and of itself. If football analytics is going to live up to the scientific rigour that it's cloaked in then work needs to be checked, scrutinised, reassessed. It may not be flashy work but it's still important.

By coincidence, there was some discussion on Twitter about the value of replicating others' work on the day of writing this post!

[Link to the Replication Project article is here]

The 'Research in Focus' series summarises and analyses the best football analytics research out there. Follow this link for all Research in Focus pieces.