A scientific evaluation of advanced stats 2009-10 to 2019-20


It’s more than thirteen seasons since Corsi was coined, ten since it officially joined the score sheet and five since its publication as an NHL stat. Corsi and its advanced stat siblings have graduated from the back pages of Blogger to the noon flagship radio show. We hear about them more than ever and their new-found fortune and fame, as well as those of their progenitors, have solidified a number of narratives. Despite their growing acceptance most advanced stats have not been scientifically vetted. Solid measures and models are only possible with large data sets collected over many seasons. With ten seasons of officially collected data we have a substantive body from which to measure how advanced and enhanced Corsi is by comparing it to other measures. Can we really take a stat promoted by a small group of Albertan amateurs seriously? After all, Corsi has both an illustrious history involving a goaltending coach and an irreverent one formed by the curve of a moustache.

One way to scientifically evaluate a stat is to measure its correlation to desired outcomes. In this analysis we present coefficients for advanced stats measured against:

  • PTS%: Percentage of total points from points available.
  • GF%: Percentage of total Goals in games that team played that are for that team. GF*100/(GF+GA).
  • GF/60: Rate of Goals-for per 60 minutes.
  • GA/60: Rate of Goals-against per 60 minutes.
  • FPTS%: Future percentage of total points from points available (correlation to next regular season).
  • FGF%: Future percentage of total Goals in games that team played that are for that team GF*100/(GF+GA) (correlation to next regular season).
  • FGF/60: Future rate of Goals-for per 60 minutes (correlation to next regular season).
  • FGA/60: Future rate of Goals-against per 60 minutes (correlation to next regular season).

The following correlation matrix uses data from ten seasons of the official score sheet. Values are the Pearson product-moment correlation coefficient (R). The closer the value is to zero, the weaker the correlation (cells are shaded in light purple). Values closer to 1 and -1 indicate strong correlations (orange and blue cells respectively). All measures are taken from (or calculated using) Natural Stat Trick 5v5 teams rates data. Future correlations follow the method most often used to test the predictive value of advanced stats and are indicated here as R (not R2). A glossary of the enhanced stats in question may be found at this link (with the exception of the "bit" measures added by the author defined in the post). The complete matrix may be found here.

Within-season coefficients

Possession & Shot Quality


Corsi (CF%) and Fenwick (FF%) do not appear to offer advantages over Shots (SF%) within-season. Shots have stronger relationships with PTS% and GF%. If a shot or shot attempt metric is to be used as a proxy for possession this analysis suggests that Shots are better than Corsi or Fenwick as their relationship to desired outcomes is stronger.

Measures that incorporate shot quality, such as War-on-ice's scoring chances (SCF%), Danger Zone measures (LD, MD, HD) and the author's related bitShots (bitSH%) and bitCorsi (bitCF%) exhibit stronger relationships with PTS% and GF% than methods that do not incorporate quality (SH%, CF%, FF%). One exception is that of expected goals (xGF%, in this case the measure proposed by Corsica Hockey). xGF% is outperformed by simpler shot quality metrics (micro stats may be required to produce improved expected goals models).

Shot quality was an issue of debate at one point within the analytics community. Several proponents of Corsi denied that shot quality existed or stated it was not measurable (including Tim Barnes, Rob Vollman and Tom Awad). While it was true at one time that shot quality data was not available, this has changed with the collection of location-based data and the websites that scrape and republish it.

bitCorsi, bitShots, bitGoals


The novel "bit" measures presented here draw heavily from the danger zones defined by War-on-ice. A bit measure is one that excludes Low-Danger Zone events and is the sum of the associated High and Medium-Danger Zone events (bitGF%= HDGF% + MDGF%, bitSF% = HDSF% + MDSF%, bitCF% = HDCF% + MDCF%). The thesis is that Low Danger events are noisy and that more meaningful measures are produced by ignoring them. bitCF% was originally presented in this post. Following the same pattern of Shots and Corsi, bitSF% has a stronger relationship to GF% than bitCF%.

PDO (Shooting + Save Percentage)


PDO (SH% + SV%) exhibits a stronger relationship to desired outcomes than any of the possession or shot quality metrics. Proponents of Corsi have generally argued that there is a "natural" shooting and save percentage and that elevated PDO (above 100) indicates that goals and saves are happening because of luck. This narrative originates from the comments section on the now-defunct Irreverent Oiler Fans blog ("PDO" was the avatar of participant Brian King). The correlates produced here indicate that either the game of hockey has more to do with luck than Corsi or that interpretation of PDO is flawed. Clearly SH% and SV% are very important to the game and some teams are better at shooting and saving than others.

A more nuanced version of the PDO narrative is that each team exhibits its own SH% and SV% and that PDO can exceed 100 without being characterized as "lucky". Generally speaking, however, increases in SH% are viewed as being subject to "regression" while attempted shots are not. Likewise a decrease in shot attempts against is favoured over Save Percentage as a defensive tactic. SH% has a stronger relationship with GF/60 (0.84) than any shot, attempted-shot, or chances metric. SV% has a stronger relationship with GA/60 than any shot, attempted-shot, or chance against metric. This is of course because these percentages are a product of both goals and shots.

This analysis indicates that sustainable increases in PDO, even if small, have important effects or can be symptoms of positive change. Such increases occur through the pursuit and limiting of shot quality, or by having better shooters or goalies. In a previous analysis using all-situations data the author was able to demonstrate that PDO could be a better predictor than Corsi for FGF% (R2=0.49). The current analysis for 5v5 data does not exactly duplicate those findings. It's likely that the previous results were the product of special teams. PDO is found here to have a much stronger relationship to goals than seems to be assumed by proponents of Corsi.

Measures of offensive opportunity


bitSF/60 has the strongest relationship to GF/60, followed by Scoring Chances (SCF/60), Shots (SF/60), Expected Goals (xGF/60), Fenwick (FF/60) and Corsi (CF/60). The rate of attempts created within the Medium-Danger Zone (MDCF/60) has a strong relationship to the rate of goals created. By observation the Medium-Danger Zone is where most odd-man opportunities are created. It is also closer to the net than the Low-Danger Zone but not so close as to remove the likelihood of play across the Royal Road. Interestingly, MDCF/60 has a stronger relationship to GF/60 than MDSF/60. When looking for measures of meaningful offensive chances bitSF/60 seems to be the best current measure.

Measures of defensive skill


Measures of defensive skill have lower correlates to GA/60 than measures of offensive opportunity have to GF/60. This is because GA/60 is more strongly influenced by SV% (and the quality of goaltending) than GF/60 is influenced by Sh% and the quality of offence. NHL teams will face between 31 and 62 (or more) goaltenders in a given season but will be back-stopped by two or three, meaning the SV% faced by offence is randomized while a team's SV% is not. xGA/60, SA/60, bitSA/60 have similar relationships to GA/60 and appear to be better measures of opportunities-against than FA/60 or CA/60. Within different Danger Zones there are different strengths of relationship to GA/60 for Corsi and Shots respectively. In the Low and Medium Danger Zones there is a stronger relationship between MDCA/60 and GF/60 than between MDSF/60 and GF/60.

Future coefficients


Using the NHL score sheet there are no statistics or measures that strongly correlate with future seasons: all coefficients are weak with none being higher than 0.35 or lower than -0.26. Any projections of future season performances based on previous season measures should not be viewed as models which firmly predict the future (R2 values are much lower than the R values published here).

The strongest measures for predicting future goals-for percentage are those related to Scoring Chances. As with within-season coefficients, xGF% does not fare as well. Shot-attempt measures produce slightly stronger coefficients with future-season outcomes than Shots. The same is true with bitCF%, which exhibits a stronger relationship to FGF% than bitSF%. One is tempted to say here that the Corsi narrative is correct, that possession hockey as measured by CF% will produce more goals in future. However, the values are so weak that it seems its more likely that there are some teams (perhaps even a few teams) for whom a positive CF% will produce increased FGF%.

Of Corsi we still don't know what we're doing

Enhanced statistics derived from the NHL score sheet have issues with reliability and validity.

Reliability has to do with internal consistency. Shots are better than Corsi and Fenwick as measures of possession. However, when attempting to predict goals Corsi and Fenwick appear to be better measures than Shots. It does not hold-up that there should be one model within-seasons and a different (weaker) model for future seasons. Additionally, there is an asymmetry between measures of offence and defence, with measures of offence having higher coefficients and better predictors.

Validity gives us an indication of the meaning of measures. We have always made the assumption that Corsi is a proxy for puck possession and that PDO is a measure of luck, that carrying-in the puck is preferable to dumping it in, etc. While collectively we have the data sets and machine vision technology to measure whether most of these assumptions are true, the collective part is missing.

The analytics community is more fractured than it ever has been with those who have ascended into the mainstream forced (by choice or otherwise) to limit or stop the conversation. At this point in his career Tim Barnes is able to get his hands on actual possession stats from the machine vision houses instead of using proxies. One has to wonder how his proxy measure fares in comparison.


In This FanPost