Filed under:

The Numbers

Both David Staples and Lowetide (Update: and Covered in Oil) have been talking about statistics in the last couple of days. Due to the speed and efficiency of the Future Shop maintenance department, I didn't comment on either of their posts, and it's a discussion that I have relatively strong opinions on.

I like to view myself as a numbers guy, although I'm hardly in the same category as the fellows who write Mc79hockey and Irreverent Oiler Fans. I'm not huge on Corsi numbers, because (as Bruce as argued so effectively) I do believe there are strategies, particularly used by Craig MacTavish during his time in Edmonton, that render something as simplistic as a shot-count metric unreliable for indicating team performance (sorry if I misinterpreted you, Bruce). I think Corsi numbers have value, but I also think they should be used with caution; I think they generally show how strong a given player is within a team, rather than being ideal for comparisons of players between teams.

In any case, I'm also not a baseball guy. I've wathced three professional ball games in my entire life (one of which was a game 7 between Arizona and somebody else in the World Series) and I have no real love, interest, or history with the game. So, unlike most of the guys that Staples talks about, I'm not basing my belief in statistics on seeing a similar performance in baseball.

Wikipedia defines statistics as follows:
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the natural and social sciences to the humanities, and to government and business.

Statistics have relevance in basically every field of human knowledge. Businesses chart their futures based on statistical analysis of their own performance and that of the marketplace. Governments and political parties set policy and enact initiatives based on heavy use of statistical data. Global warming was first raised as a major concern through scientific models based on statistics and statistical trends.

Statistics are simply compilations of data. I think we can move beyond the notion that statistics, in and of themselves, are the problem. Thus, the question becomes two-fold:

1) Is it possible to analyze hockey statistically?
2) Are current statistics advanced enough for relevant analysis?

I think that the answer to both of these questions is a qualified yes. People talk about how fast moving the game of hockey is, as though that were a reason for it to defy analysis. Thanks to the miracle of NHL play-by-play charts I can look at every faceoff, shot, hit, giveaway, takeaway, penalty, shot block, save, goal, who did it, where it was done, and who was on the ice for each team when it happened. I can tell you when it happened, and what the game situation was at the time. It's an incredible amount of information, and it gives us the most detailed record we've ever had of on-ice action. I can go in depth on a variety of these- what kind of shot (wrist, slap, snap), and how far from the net it was. The play-by-play even includes missed shots. The play-by-play charts add the most vital missing part to any hockey discussion based on statistics: context.

This takes me to he statistics themselves. I discussed my feelings about Corsi numbers earlier, so I won't repeat myself, but I will get into some of the other currently available statistics, and what I feel their value is. These three sets of statistics are frequently referenced by me, and come from Behindthenet.ca.

PTS/60

This should be straight forward. Consider two players, A & B. Player A had 10 goals and 20 assists at even strength, while Player B also had 10 goals and 20 assists in the same situation. However, Player A averaged 10 minutes a night, while Player B averaged 15 minutes a night (all at even strength). A quick glance at the basic stats (goals, assists, points) would tell you that these players are equally valuable as scorers. Going by PTS/60 (points scored for every 60 minutes of icetime), Player A is clearly the more efficient scorer, with 2.19 PTS/60, while Player B is much further back at 1.46 PTS/60. Thus, PTS/60 does a good job of removing ice-time from the equation.

GFON/60 and GAON/60

The premise of these statistics is simple: in hockey, the object is to score more goals for than against. Thus, being on the ice for a high number of goals for and a low number of goals against is good. There is a strong correlation between GFON/60 and PTS/60, as would be expected since both are measures of offensive abiity/contribution. A player with a low GAON/60 likely does a good job of preventing goals against; i.e. is a strong defensive player.

QUALCOMP and QUALTEAM

These statistics go one step further, and try and remove the effect of teammates/opposition. They go by the same premise as GFON/60 and GAON/60; a player who is on the ice for more goals for than against is a good player, while a player on the ice for fewer goals for than against is a bad player. Relatively simple.

Take Matt Greene, as an example. He's widely regarded in the traditional media as a shut-down defenseman. Yet, if that's the case, why was he consistently on the ice against players who score fewer goals for than they allow against? This is a good example of how statistics can lay waste to traditional thinking - stop looking at the fact that he's big and tough, and look at who he's playing against.

As another example, take Sam Gagner. Gagner had a tremendous season last year, and I believe will evolve into a franchise player in the future. Last season, however, he generally played with forwards and defensemen who do a good job of scoring more goals than they allow, leading Oilers forwards in QUALTEAM.

Looking thorugh the Oilers players, these statistics agree with the games I watched - guys like Jarret Stoll, Marty Reasoner, Steve Staios and Sheldon Souray played against top opponents, while the younger players saw easier ice-time. It makes sense, and is exactly how you would expect Craig MacTavish to use his players.

Another thing these statistics do is show where players are getting their points. By splitting the ice-time into even-strength, power-play and short-handed, we can see who is getting the benefit of good ice time, and who isn't. Let's look at three identical players, Players C, D and E. Here is their average icetime:

Player C: 10:00 EV, 3:00 PP
Player D: 13:00 EV
Player E: 10:00 EV, 3:00 SH

I guarantee that Player C would score more points than Player D, who would score more points than Player E, with all other things being equal. It's just the way it is.

To my mind, anyone with a serious (or possibly professional) interest in the game of hockey is doing themselves and people who listen to them a disservice by ignoring the data offered by statistics.

Now that I have that out of my system, here's a caveat: statistics should agree with what you see watching the game. This is why I like Behind the Net, and the work done by various bloggers; it generally shows the same game that I watch. When I look and see who Matt Greene plays against, I see the same thing as Desjardins' statistics tell me - he's playing relatively weak opposition. When I watched games last season, nearly every time the other team's top line took a faceoff in the Oilers defensive zone, Jarret Stoll and Marty Reasoner were on the ice for the Oilers.

All of this seems obvious to me, and it doesn't have a thing to do with baseball or distaste for "just watching the game".

Addendum: I hinted at this, but didn't say it explicitly - in my opinion, the chief difference between statistical analysis and just eyeballing it is that one judges results while the other judges method. I firmly believe that results (scoring more goals than you allow) is more important than method (he's fast, tough, mean, etc.) for determining which players help teams win.