Everything you need to know about expected goals

Expected goals is everywhere, and it kind of has been for quite a while. However, that doesn't mean everyone's had it explained nicely for them.

So here's a quick, handy, cut-out-and-keep guide to everything you need to know about expected goals.

Subscribe to the newsletter to demystify football in your inbox

'What is expected goals' in 12 words

Expected goals is a way of measuring how good a chance is.

(If you imagine pressing the pause button right when a player connects with the ball, that's the information xG is using. It (deliberately) doesn't know what direction the ball's heading in, but more on that later)

How is xG made?

Expected goals is calculated by a computer model. How this works is a computer starts with a lot of data on one side and whether the shot went in on the other. The model learns what bits of data are important to a goal being scored. Voila, an expected goals model.

These calculations are all basically just maths, but if you were doing it by hand then it'd take years. So we use computer programmes instead.

What's in the xG model (usually)?

Things like the distance from goal and angle to goal of a shot, whether the shot is a header or shot with a player's foot, how the shot was assisted.

Some things will depend on the data available. Some data collection companies collect info on how clear a sight of goal the shooter had. If they don't, you can often use proxies based on the data that you do have, like if the shot came from a quick attack or through-ball.

What is it used for?

Expected goals is a useful stat for judging how strong a team or player is in attack (or for a team's defence by looking at opponent xG). It's better at predicting the number of goals a team or player will score in the future than other stats, like shots or goals they've already taken.

It can be used to see whether a striker is on a hot or cold streak, although most players will score a similar amount to their xG over a long period of time. This has been useful in the transfer market in the past: players who were on a hot streak might be overvalued, or vice versa.

It's also simply a useful unit for analysis. Expected goals gives you a figure for whether a shot is a decent chance or bad chance. A team might give away a lot of shots from attacks down their left flank, but those might be poor shots; the good shots they give away might come down their right. Having xG helps to easily tease this information out, and present it to others quickly.

That can help in analysis but also in the media. Once you're used to xG, it's quicker to say "Team A got 2.3xG while Team B only got 0.6" than "Team A got 16 shots, a couple of which were one-on-ones, while Team B only got 10, most of which were from distance".

Why isn't the direction of a shot included? Isn't that important in 'expecting' goals?

Knowing whether a shot is headed for the top corner or row Z is definitely useful for predicting whether it'll be a goal. However, whether you want that particular data point depends what question you're asking.

If you want to know how leaky your defence is, you want to know the chances you're conceding - you don't necessarily want to know anything about how good the finish is. The same is true if you want to know whether a striker's getting in good positions.

There are models which take 'shot placement' into account though. These are usually called either 'post-shot xG' or 'xG on-target' (the names are fighting it out in a kind of terminology survival of the fittest). Models like these can also be useful for assessing goalkeeper performance.

Does xG take player skill into account?

Not usually. Instead, the models work on the basis of an average value for that type of shot, and the value wouldn't change depending on whether it was a striker or goalkeeper taking it.

Part of this is for the same reason as why the direction of the shot isn't included. If you want to know whether a team is creating (or conceding) good chances, then you want to separate the chance itself from who's taking it.

The other big reason is that it takes a long time for a player's 'finishing skill' to show up in the data in a robust way. That leads to articles like this, which point out that getting on the end of chances, rather than being a consistently clinical finisher, is the biggest influence in a player being a big goalscorer.

(Lionel Messi is widely recognised in the analytics community as being a rare player who both gets high xG figures and consistently scores even more. Since the start of 2017/18, he's scored 30 goals more than xG would expect of him in domestic leagues. By comparison, Robert Lewandowski, Harry Kane, Karim Benzema have all scored around 13 extra, an average of just a couple per season. Cristiano Ronaldo has just an extra 4.2. Stats from FBref, correct as of 23 February 2022)

Data people always talk about 'sample size' - what does that mean here?

Basically, if you flip a coin three times you might get heads 100% of the time; but if you flip a coin 300 times, you're not gonna get 100% heads.

With expected goals, there can be little bits of information missing for each shot. But over enough shots the effect of this imperfect data becomes less and less significant.

Who came up with it?

D'you know the story about Isaac Newton arguing with a German mathematician, Gottfried Leibniz, about who came up with calculus first? Probably not, to be fair. Basically, they both developed the ideas independently.

The same is kind of true with expected goals - lots of people have had similar ideas. There's a recognisable version of the expected goals concept in a 1997 academic paper by Richard Pollard and Charles Reep (read more about that here). Wikipedia notes a study that Pollard, Jake Ensum, and Samuel Taylor worked on in 2004 too.

The phrase 'expected goals', used in its current form, dates back to a blog by Sam Green, then of Opta (a data collection company now part of Stats Perform), in 2012. The blog is also an important marker in the online analytics development of the early/mid 2010s.