Estimation of ice time using scoring events: how to improve the model
Scott recently made a great article about the estimation of ice time through scoring events. I actually made a very similar study last year in one of my math class. I tried to estimate the total ES ice time of a player "i" per year (Ti) by multiplying the total ES ice time of a hockey season (T) by the ratio of the number of events the player had been on the ice during the season (ni) to the total number of events during the season of the team considered (N):
Ti = (ni/N)*T
Then, to get the ice time per game of player "i" (ti), we just divide the total ice time of this player (Ti) by the number of game he played (G):
ti = Ti/G
Similarly to Scott, I compared the estimated ice time (ti) with the real ice time reported by the NHL (ti real), and it led me to correlation coefficients roughly similar to the one he had in his article.
However, what I also did was to try to improve the method by adding some correction factors to compensate for the fact that the number of events per time on ice change from player to player, which is why there is a discrepancy between the estimated ti and the real ti. My line of thinking was that there is a good chance a player will have a higher number of events per ice time than the average if :
- he has a high offensive productivity and/or
- he plays with players who have a high offensive productivity and/or
- he plays against players with high offensive productivity.
To build the models, I used the data from the season 2009-2010 of the Montreal Canadians. For every Habs player who played more than 20 games in the season (a total of 23 players), I looked at the number of ES events where they were on the ice during the season, but also at with who they were with and against during the event. Note that I did not considered EN goals as ES events. After that, I took:
- the number of points per game of the player considered (xi) as an estimate of his offensive productivity
- the average number of points per game of the teammates of the player considered during the events (yi) as an estimate of the offensive productivity of his teammates
- the average number of points per game of the opponents of the player considered during the events (zi) as an estimate of the offensive productivity of his opponents.
Now, the downside is that pts/game are also related to other things than the rate of events of a player, but we are limited by what is available at the junior or AHL level. Also, it would probably have been a bit better if I had used ES pts/g instead of pts/g, but I was a bit too short on time to do so.
So, for instance if there was an goal during a Habs-Oilers game were these players where on the ice:
Habs:
Gionta (0.75 pts/g)
Gomez (0.76 pts/g)
Cammalleri (0.77 pts/g)
Markov (0.76 pts/g)
Spacek (0.28 pts/g)
Oilers:
Jacques (0.22 pts/g)
Reddox (0.22 pts/g)
Stortini (0.17 pts/g)
Smid (0.18 pts/g)
Strudwick (0.08 pts/g)
Then, with respect to player "i" being Brian Gionta, we would calculate: xi = 0.75 pts/g, yi = 0.64 pts/g and zi = 0.17 pts/g. The numbers used in the analysis for Gionta are thus constituted of the average of the xi, yi and zi for every ES events were he was on the ice during the 09-10 season. If you're a frequent C&B visitor, you'll probably recognize that this method is somewhat similar - and actually inspired by - the way Scott/Jonathan calculate the qualcomp and qualteam for AHL teams.
To incorporate these data into the analysis, I look at different types of models (linear, quadratic and logarithmic). For instance, a simple linear model could look like this:
Ti = (Axi + Byi + Czi)*(ni/N)*T
This equation is pretty similar to the first one of this article, except that, like I explained earlier, an additionnal term has been added to compensate for the "low events" or "high events" players. The value of A, B and C are calculated by a least mean square algorithm to obtain the "best" value of these parameters. To give you a rough idea on the impact of this correction factor on the level of accuracy of the calculated ES time on ice: without the correction factor, the average absolute difference between the ti estimated and the ti real for the 23 Habs players considered was of 2.0 min per game, while with the correction factor it was reduced to 1.4 min per game.
Overall, the conclusions of my study were:
- the scoring events certainly seem to have the potential to be used to calculate a rough estimate of the ice time of junior/AHL players with a level of accuracy of under ± 2 min per game with a high confidence level
- the addition of a correction factor seems to allow to upgrade the accuracy of the model, but the magnitude of the improvement is still unknown
- further study is required to determine the type of correction factor (linear, quadratic, etc.) which gives the best results
- to track the scoring events by hand takes a monster amount of time.
Like Scott said, this methodology sure isn't perfect, but I certainly think it has some value. I also took the time to crunch the numbers (for the whole NHL this time, but without correction factors) to see if the accuracy is similar for forwards versus d-man, if it varies a lot from team to team, what is the minimum amount of TOI that a player must have to reduce substantially the background noise, etc. If some of you are interested by the results, I could always take the time to write a second FanPost. Also, the objective of this post was mostly to give Scott and anyone interested ideas on how to improve the method, thus why I didn't go too deeply in the math portion of the methodology, but if anyone is interested to go more deeply in it, just ask.
*Note that english is not my first lanquage, so please be indulgent with respect to the grammar and spelling mistakes.
6 comments
|
0 recs |
Do you like this story?
Comments
This was very well written, so I wouldn’t worry about the language barrier. You’re clear and consise. The most confusing thing is the math, but hey, I’m an Arts student.
Out of curiousity, what is your first language?
PS: Excellent study of the relation between events and TOI. The only problem is + or – 2 minutes is a pretty big discrepancy (difference between 1st and 2nd line icetime). Obviously it’s probably impossible to lower the margin of error on your calculation, but I think it adequately proves that this method is a good indicator of icetime when it’s not available.
Thanks AdR23. I’m French Canadian, thus why I took the Habs for my work (even though, for a reason I can’t explain, I’m mostly an Oilers fan).
Yep, the way we estimate TOI initially leads to a margin of error of about ± 2 min, which means that it can pretty much only give us pertinent information about players who have a TOI very different from the average. However, the good news is that there are ways we can tweak the model and use the others stats available to upgrade significantly the accuracy. I’ve pretty much showed in my study that we can get to about ± 1.5 min, but I’m pretty confident that with further work and a couple of could good ideas it’s possible to develop a model with an accuracy near or under ± 1 min per game. The sad part is that, to do so, we do not only need skills in math, but also in informatics (because doing it by hands takes way too much time), which is not really my case.
I’ve been thinking:
If the whole study is predicated on the amount of “events” in order to create accuracy, couldn’t we reduce the margin of error by tracking shots as well? This would be much more difficult and labor intensive, but it would (in some cases) triple the samples. Would that work?
I also think that some modification to the formula (that includes shots some sort of Corsi/unit of time metic) and we can find hockey’s version of On-Base Percentage.
Shots would be much better, but unfortunately, we don’t have that data available to us at the CHL level.
The biggest fanana of the Havana Bananas.
by Scott Reynolds on Aug 24, 2011 11:45 PM MDT up reply actions
Yeah, that’s the problem at the CHL level. But at the NHL level this could be a really useful advanced metric. It would show territorial advantage (more accurately than Corsi already does) by calculating the amount of shots per icetime (you might even get to lower it down to shots/shift).
Theoretically a player with a high shot/uniticetime would spend more time in the Ozone and this be an outscorer, and factored in with QualComp (so you can find role outscorers – ie. PvP outscorers) you should, in theory find hockeys equivalent to OBP
I am confused with how this is different from Corsi (or Fenwick or Shots) per sixty minutes. Or is that exactly what you’re talking about?
The biggest fanana of the Havana Bananas.
by Scott Reynolds on Aug 25, 2011 12:30 PM MDT up reply actions

by 























