As a follower of advanced statistics in hockey, I'm always on the lookout for stats that I consider "meaningful". Hockey is a difficult game to measure and there is always a chance that the data we track, analyze and discuss doesn't measure what we want it to measure. For instance, do "hits" really tell us anything of value after a game? Does it make sense to apply save% to defensemen?
All the discussion in the advanced stats community seems to focus on a concerted effort to measure possession. The assumption being that teams who have good possession metrics will, statistically, have a better chance of winning and therefore, a better record and ultimately finish high in the standings. So far, I have no arguments with the logic.
Corsi and Fenwick
I'm not writing this to beat-up on Corsi and Fenwick. David Staples has already pointed out some limitations of Corsi, so I won't go into too much detail, but I believe such limitations that are largely ignored because of a lack of reasonable alternative metrics.
Currently, the predominant thinking is that scoring chances for and against are acceptable measures of possession. In fact, the glossary at Hockey Prospectus defines Corsi as "a proxy for possession", and notes that Fenwick has "better predictive value for goal differential than Corsi."
To feed my curiosity, I conducted a review of 82 NHL games (3.3% of the 2,460 games in a season) from February 27 to March 9, 2014 at www.extraskater.com. My purpose was only to see if the winning team in each game had a better Corsi For % than the losing team. I didn't look at Corsi Close (or 5v5, 5v4, 4v5) because I'm not convinced that shrinking the sample size (minutes played) improves the validity of the data.
Corsi and Fenwick certainly have value in their ability to reveal a team's weak 5v4 or 4v5 play, but has marginally better predictive value than simply measuring shots alone. Each sub-analysis of Corsi and Fenwick shrinks the sample size, which I find concerning.
What I found in my analysis was:
- in 39 games (47.5%), the winning team had a Corsi above 50%
- in 39 games (47.5%), the winning team had a Corsi below 50%
- in four games (4.9%) both teams had a Corsi of 50% (all four were one-goal differentials)
My quick analysis tells me that the predictive value of Corsi is slightly less than a coin toss.
For statistics to have meaning, they must have some predictive value. Summary statistics (goals, assists, points, etc.) have a purpose in that they tell us what happened, but they don't help us predict future performance with any degree of accuracy.
Validity: What are we measuring?
EricT recently wrote an interesting piece on the predictive abilities of Corsi. Measurement of statistics in hockey is indeed challenging. People have tried, with mixed success, to compare it to baseball, basketball, football and soccer. I've argued in the past and I still believe that, in terms of measurement and analysis, hockey is distinctive. It certainly shares some similarities with other sports, but the fluidity (constant change), and speed of the game make it unique and difficult to measure. Soccer seems to have some important similarities, but more on that later.
Ultimately, Corsi and Fenwick are an enhanced analysis of shooting behaviour. I think it's a fair assumption that the team that directs more shots to their opponent's net than their own is likely to have the puck more. But to accept that assumption as fact simply guides us in the wrong direction and ignores other important and unmeasured variables. It seems like Corsi/Fenwick have had their applicability stretched to the limit because of the absence of any other meaningful in-game statistics.
Analysts use QualComp and WOWY to try to isolate some factors, but ultimately, we're still measuring shooting behaviour and ignoring a whole bunch of other relevant performance data. There are many variables that contribute to a "scoring chance". Is an enhanced version of plus/minus robust enough to provide a sound basis for measuring the quality of competition? Shouldn't we be measuring actual things rather than using a "proxy" as a basis for a bunch of other measures?
Finally, when looking at the performance of individual hockey players, I'm not convinced that it's important to make a distinction between even-strength minutes, power-play minutes, short-handed minutes or times when the score is close. If a player or group of players is prone to making poor decisions that lead to on-ice mistakes, why would a particular game situation have an impact on their performance? (Although it certainly would from a coaching/deployment standpoint)?
I understand the desire to identify "clutch" players who perform well under pressure, but rather than find a measure for that intangible, I would argue that better individual performance measures will ultimately identify those players.
Some new metrics
If we agree that possession is a metric that we want to focus on for both team and individual performance, why not focus on measuring it accurately at the player level? Rather than trying to find a proxy for possession, why not record events that result in a change of possession? The idea here is, if a team has a bunch of players who turn the puck over, they are a poor possession team. A team that has players who force turnovers and don't give many pucks away is a strong possession team. Starting from each face-off, we can track the events of possession -- mostly passing, but also other events that result in a change of possession. Here are some suggestions:
- Giveaways - yes this is already measured by the NHL, but I understand that there is a lack of consistency in how a giveaway is recorded. A giveaway can be defined as a player, who has possession of the puck, loses possession by either forced or unforced error. The focus here is to improve the reliability of the data.
- Turnovers created - this one is the flip-side of giveaways - when a player (on Team A), who does not have possession of the puck, causes a member of the opposing team (Team B) to lose possession of the puck to the opposition (Team A). This is different from a "takeaway", which is both unreliable and misleading.
- Passing success - passing the puck is a possession vulerability. It's a crucial, but unmeasured skill in hockey. There are three possible outcomes of a pass: 1) completed pass, 2) incomplete, loose puck, and 3) interception, change of possession. I'm less interested in cases where missed passes are recovered because the passer missed his target or the receiver missed the pass. Either way that stat is captured below.
- Passes attempted - total number of passes attempted
- Passes completed (%) - percentage of attempted passes that reached their target - possession maintained.
- Passes received (%) - This is the measure of a player's ability to receive a pass. There are two possible outcomes: 1) pass received - possession maintained, 2) pass missed. There is no reason to measure whether possession was maintained after the pass was missed because we're measuring the player who missed the pass, not the team's possession.
- Passes intercepted - This is a measure of players who do not have possession of the puck, but gain possession by intercepting a pass -- a skill of positioning and anticipation. Players who give away the puck by causing an interception are picked up by the " giveaways" measure.
- Dump-ins - used as a global team measure. Total number of times the puck is chipped below the goal line in the offensive zone. The limitation of this one is filtering out dump-ins for line changes. Perhaps differentiating between active (dump and chase) and passive dump-ins (line changes) would help.
- Dump-in recovery - used to measure dump-and-chase success. Total number of times the offensive team maintains possession of the puck in the offensive zone. This too has some issues because possession can change a few times before it leaves the offensive zone.
- Icing - both an individual and team measure. Who iced the puck and how many times (not including PK)
- % of shots blocked - this is a measure of a shooter, not a blocker. It is the percentage of shots taken that are blocked by the opposition. Avoiding having your shot blocked (reading the defence, quick release, move/drag) is just as much of a skill as blocking the shot.
- Performance consistency - this is exactly what is sounds like and can be easily summarized for any measure you wish - how consistent a player is over time - points, blocks, giveaways, passing, etc. It's ultimately an average of all data in a category plus a variance to show how much fluctuation exists in that category over time. This could be broken down into five-game segments by default, but could be adjusted to any period of games.
- Rebounds - a new goalie stat - how many rebounds went back into the slot (regardless of what happens afterwards).
At this point, I don't see a benefit of trying to collect data relating to puck battles or loose puck situations where neither team has clear possession.
Comparison to Soccer and Potential for Hockey
The above suggestions are based on a review of advanced statistics in a number of other sports, but overall, soccer seemed to have the highest applicability to hockey (see this article). Soccer is a possession game that relies on passing to create scoring chances of varying quality. The number of goals in a game is relatively low and there are very few player substitutions, but generally, the premise of the game and object of play is very similar.
In recent years, the advanced stats field in soccer has grown by leaps and bounds. The soccer website Optasports.com makes hockey statistics look primitive. Maybe there isn't enough money in it for Opta to add North American hockey to their portfolio, but the potential for these types of stats is unbelievable. Corsi and Fenwick have been interesting, but I'm ready for the next level of advanced statistics. Maybe the NHL shouldn't be the source of the data.
If we can start tracking specific performance measures with high reliability, the potential that exists for scouts and coaches at all levels of hockey is amazing. Measuring the events that cause changes in possession will add valuable data to the analysis process and become a more accurate measure of possession than anything we've had to date.