Everything you need to know about expected goals

Expected goals is everywhere, and it kind of has been for quite a while. However, that doesn't mean everyone's had it explained nicely for them.

While Get Goalside does a lot of 'hardcore' analytics writing, it's also a newsletter for anyone just getting into football stats too. So this is a quick, handy, cut-out-and-keep guide to everything you need to know about expected goals. Feel free to share it around, or bookmark for future reference.

'What is expected goals'? in 12 words

Expected goals is a way of measuring how good a chance is.

(The reason I say 'chance' instead of 'shot' is because, if you imagine pressing the pause button right when a player connects with the ball, that's the information xG is using. It doesn't know what direction the ball's heading in. More on that in a later question).

How is xG made? ...is 'made' the right word?

Yeah, 'made' works.

Expected goals is calculated by a computer model. How this works is that you give the computer a lot of data on one side and whether a goal was scored from that shot on the other. It learns what bits of data are important to a goal being scored. Voila, an expected goals model.

These calculations are all basically just maths, but if you were doing it by hand then it'd take years. So we use computer programmes instead.

What's in the xG model (usually)?

The distance from goal and angle to goal of a shot will be in there. Whether the shot is a header or a footed shot, and how the shot was assisted, will be in too.

After that, what type of data is being used can come into play. Some data collection companies directly collect how clear a sight of goal the shooter has. If they don't, you can often use proxies based on the data that you do have, like if the shot came from a quick attack or through-ball.

What is it used for?

As we mentioned, xG is useful for judging how strong a team or player is in attack (or, for teams, in defence by looking at opponent xG). It's better at predicting the number of goals a team or player will score in the future than other stats, like shots or goals they've already scored.

It can be used to see whether a striker is on a hot or cold streak that's probably going to come back to normal in time. If they're on a hot streak you might not be as enthusiastic about buying them; if they're on a cold streak it might be a good time to strike.

It's also a useful unit for analysis. Expected goals takes into account whether a shot is a decent chance or bad chance. A team might give away a lot of shots from attacks down their left flank, but those might be poor shots; the good shots they give away might come down their right. Having xG helps to easily tease this information out, and present it to others quickly.

Finally, it's also handy from a narrative point of view, which is why it's taken off in the media. You don't necessarily need expected goals to tell which team has been better in a match -- often it'll be the team who got more shots and/or shots on target. But, once you're used to xG, "Team A got 2.3xG while Team B only got 0.6" is a lot more compelling than "Team A got 16 shots, while Team B only got 10, most of which were from distance".

Ok, but why not include the direction a shot is travelling in? Isn't that important in 'expecting' goals?

Knowing whether a shot is headed for the top corner or row Z is definitely useful for predicting whether it'll be a goal. However, whether you want that particular data point depends what question you're asking.

If you want to know how leaky your defence is, you want to know the chances you're conceding - you don't necessarily want to know anything about how good the finish is. If you want to know if a team's attack is ticking, or if a striker's getting in good positions, the same is true.

There are models which take 'shot placement' into account though. These are usually called either 'post-shot xG' or 'xG on-target' (the names are fighting it out in a kind of terminology survival of the fittest). These are mainly useful for assessing goalkeeper performance.

Get the best writing on football analytics in your email inbox, subscribe to Get Goalside

Does xG take player skill into account?

No. Or, at least, very infrequently. Instead, the models work on the basis that the 'average player' is taking the shot.

One of the main reasons for this links back to the previous question and answer: does the skill of the player matter to the problem you're looking at? If you're looking at xG on a team level, the player's technique probably doesn't matter - should a team be awarded more for their chance-creation because their striker rather than their full-back got on the end of it?

The other big reason is that it takes a long time for a player's 'finishing skill' to show up in the data in a robust way. Smarter minds than mine have done proper studies into it and found that even a couple of hundred shots isn't enough to be statistically sure that a player's ability is the reason they score more or less than their xG. It could still, even then, just be chance.

That leads to articles like this, which point out that getting on the end of chances, rather than being a consistently clinical finisher, is the biggest influence in a player being a big goalscorer.

(Lionel Messi is widely recognised in the analytics community as being a rare player who both gets high xG figures and consistently scores even more. Since the start of 2017/18, he's scored 30 goals more than xG would expect of him in domestic leagues. By comparison, Robert Lewandowski, Harry Kane, Karim Benzema have all scored around 13 extra, an average of just a couple per season. Cristiano Ronaldo has just an extra 4.2. Stats from FBref, correct as of 23 February 2022)

Data people always talk about 'sample size' - what does that mean here?

Basically, if you flip a coin three times you might get heads 100% of the time; if you flip a coin 300 times, you're not gonna get 100% heads.

With expected goals, there can be little bits of information missing for each shot. The people who said "you can't tell everything from data" when xG first hit the mainstream were sort of right.

However, over enough shots the effect of this imperfect data becomes less and less significant.

Who came up with it?

D'you know the story about Isaac Newton arguing with a German mathematician, Gottfried Leibniz, about who came up with calculus first? Basically, they both developed the ideas independently.

The same is kind of true with expected goals. The earliest thing I'm aware of that you could recognise as a calculation of the expected goals concept is in a 1997 academic paper by Richard Pollard and Charles Reep (read more about that here). Wikipedia notes a study that Pollard, Jake Ensum, and Samuel Taylor worked on in 2004.

The thing that often gets cited is a blog by Sam Green, then of Opta (a company now called/part of Stats Perform), in 2012. It appears to be the first use of the phrase 'expected goals' to describe the 'expected goals' stat we're familiar with, as well as using the shortened term 'xG'.

For more on how to build your own expected goals model, check out this Friends of Tracking (a reference to tracking data, not cookies) video on YouTube: