Ever been in one of those blog discussions where the other guy says something like "Yeah, well, the chance of success of a 2nd round pick is only 14.2% so that trade is worthless"?
That said, ever wonder those numbers come from? And are those numbers actually useful?
Lot's of folks have done work over the years to crunch draft data, but before I ever believe a number, I want to know:
- What was the data source?
- How recent is the data?
- How large is the sample size?
- What are the assumptions behind the analysis? e.g. if the success rate for a particular draft round is x%, what is meant by 'success'? A single game played? A full season played? Some game threshold?
There's more of course, but hopefully you get my point - you can't, or at least you shouldn't, just accept a number at face value.
More to the point, most of the draft success numbers I see don't help me anyway.
What does a first round draft success of 85% tell me about Darnell Nurse? Is trading a 2nd rounder for a 3rd and a 4th a good trade or a bad trade? I mean, hey, there are few things more productive or valuable than winning a blog argument, but sometimes you just wanna understand sh*t.
So I did what any self-respecting programmer / math / finance / stats guy would do - I crunched the numbers myself. Here's what I got ... hope you find it useful. (NOTE: details as to what data I used, and what tools and techniques were required, is detailed in the "FOR GEEKS" section at the end of this post).
A picture is worth a zillion bits or something, so let's start with the pretty bits. Blue is raw data for each pick, green is a smoothed line (rolling average of 15 picks), and the red "Idealized" line is a calculated version of a highly smoothed line that mathematically fits the distribution. I have two charts (success rate and games played) for each section, and four sections (all players, forwards only, defense only, goalies only).
Not many surprises here - although when looking at games played, the dropoff from the early stages of the first round is far steeper than I thought it would be. It tells me that the value of a high first round pick is higher than I thought, and the value of a second round pick is not particularly high.
These graphs are in line with what I expected, that there is a smooth steady drop off in success as you get later into the draft. Again, the main surprise is the steepness of the game dropoff from the first to the second round.
First surprise! Traditional wisdom is that you can get lots of good defensemen late in the draft. Not entirely true, or at least not that much moreso than for forwards. The second and subsequent rounds are markedly weaker than the first rounds.
Where the traditional wisdom comes in is that the sweet spot for defensemen is NOT at the top of the draft, it's in picks 11 to 30. As for Darnell? He's a bit outside that sweet spot, but history suggests he's going to be a player.
Voodoo - I mean, Goalies
Again, the surprise for me is that the first round is still the strongest round for these guys. But less surprisingly, the drop off is least steep as you get later in the rounds, with hefty peaks late in the first, in the middle of the second, and the first half of the third. Ignore the first steep dip in the "Idealized" curve - that's not signal, that's confusion. The sparser data combined with the high peak to start followed by three more substantial peaks is wreaking havoc with the math.
The main thing I draw from combining the information here is to tweak drafting strategy:
- Always take forwards in the first 5 to 10 picks. (I'm looking at you Aaron Ekblad - and then I'm looking away quickly because you are one scary dude)
- The sweet spot for loading up on D is picks 11 to 30. Not to worry, Darnell, #7 is just fine!
- The sweet spot for loading up on G is picks 45 to 85
Other than that, presumably stick to BPA as much as you can.
Here's the data in summary format, by round and by pick range (bins of 5). Use this to win those arguments!
- My 'success factor' criteria was 10 games. One seemed too few. Gut feel!
- The only voodoo I used is in generating my "Idealized" curve. I used cubic b-spline interpolation with a smoothing factor of 4. I used 9 knots for the calculation of the spline - endpoints were the average of the first two and last two values, and the mid 7 knots were the average of the 30 picks for the respective round. It seemed to work pretty well everywhere except for goalies - sparse noisy data means more knots required I think.
- The source of my draft data was hockeydb.com, said data entered into a massive Excel spreadsheet
- The analysis (charts and tables) were number crunched using Python - specifically, SciPy, Numpy, Pandas, and Matplotlib, all obtained through the Anaconda distribution.
- The tables were pretty printed in Excel, and snapped to Imgur using ShareX