Scott recently made a great article about the estimation of ice time through scoring events. I actually made a very similar study last year in one of my math class. I tried to estimate the total ES ice time of a player "i" per year (Ti) by multiplying the total ES ice time of a hockey season (T) by the ratio of the number of events the player had been on the ice during the season (ni) to the total number of events during the season of the team considered (N):
Ti = (ni/N)*T
Then, to get the ice time per game of player "i" (ti), we just divide the total ice time of this player (Ti) by the number of game he played (G):
ti = Ti/G
Similarly to Scott, I compared the estimated ice time (ti) with the real ice time reported by the NHL (ti real), and it led me to correlation coefficients roughly similar to the one he had in his article.
However, what I also did was to try to improve the method by adding some correction factors to compensate for the fact that the number of events per time on ice change from player to player, which is why there is a discrepancy between the estimated ti and the real ti. My line of thinking was that there is a good chance a player will have a higher number of events per ice time than the average if :
- he has a high offensive productivity and/or
- he plays with players who have a high offensive productivity and/or
- he plays against players with high offensive productivity.
To build the models, I used the data from the season 2009-2010 of the Montreal Canadians. For every Habs player who played more than 20 games in the season (a total of 23 players), I looked at the number of ES events where they were on the ice during the season, but also at with who they were with and against during the event. Note that I did not considered EN goals as ES events. After that, I took:
- the number of points per game of the player considered (xi) as an estimate of his offensive productivity
- the average number of points per game of the teammates of the player considered during the events (yi) as an estimate of the offensive productivity of his teammates
- the average number of points per game of the opponents of the player considered during the events (zi) as an estimate of the offensive productivity of his opponents.
Now, the downside is that pts/game are also related to other things than the rate of events of a player, but we are limited by what is available at the junior or AHL level. Also, it would probably have been a bit better if I had used ES pts/g instead of pts/g, but I was a bit too short on time to do so.
So, for instance if there was an goal during a Habs-Oilers game were these players where on the ice:
Gionta (0.75 pts/g)
Gomez (0.76 pts/g)
Cammalleri (0.77 pts/g)
Markov (0.76 pts/g)
Spacek (0.28 pts/g)
Jacques (0.22 pts/g)
Reddox (0.22 pts/g)
Stortini (0.17 pts/g)
Smid (0.18 pts/g)
Strudwick (0.08 pts/g)
Then, with respect to player "i" being Brian Gionta, we would calculate: xi = 0.75 pts/g, yi = 0.64 pts/g and zi = 0.17 pts/g. The numbers used in the analysis for Gionta are thus constituted of the average of the xi, yi and zi for every ES events were he was on the ice during the 09-10 season. If you're a frequent C&B visitor, you'll probably recognize that this method is somewhat similar - and actually inspired by - the way Scott/Jonathan calculate the qualcomp and qualteam for AHL teams.
To incorporate these data into the analysis, I look at different types of models (linear, quadratic and logarithmic). For instance, a simple linear model could look like this:
Ti = (Axi + Byi + Czi)*(ni/N)*T
This equation is pretty similar to the first one of this article, except that, like I explained earlier, an additionnal term has been added to compensate for the "low events" or "high events" players. The value of A, B and C are calculated by a least mean square algorithm to obtain the "best" value of these parameters. To give you a rough idea on the impact of this correction factor on the level of accuracy of the calculated ES time on ice: without the correction factor, the average absolute difference between the ti estimated and the ti real for the 23 Habs players considered was of 2.0 min per game, while with the correction factor it was reduced to 1.4 min per game.
Overall, the conclusions of my study were:
- the scoring events certainly seem to have the potential to be used to calculate a rough estimate of the ice time of junior/AHL players with a level of accuracy of under ± 2 min per game with a high confidence level
- the addition of a correction factor seems to allow to upgrade the accuracy of the model, but the magnitude of the improvement is still unknown
- further study is required to determine the type of correction factor (linear, quadratic, etc.) which gives the best results
- to track the scoring events by hand takes a monster amount of time.
Like Scott said, this methodology sure isn't perfect, but I certainly think it has some value. I also took the time to crunch the numbers (for the whole NHL this time, but without correction factors) to see if the accuracy is similar for forwards versus d-man, if it varies a lot from team to team, what is the minimum amount of TOI that a player must have to reduce substantially the background noise, etc. If some of you are interested by the results, I could always take the time to write a second FanPost. Also, the objective of this post was mostly to give Scott and anyone interested ideas on how to improve the method, thus why I didn't go too deeply in the math portion of the methodology, but if anyone is interested to go more deeply in it, just ask.
*Note that english is not my first lanquage, so please be indulgent with respect to the grammar and spelling mistakes.