LISTSERV - HOCKEY-L Archives

In response to Doug Garn's "open letter" to CHODR et. al. and since I've
got some spare time as I proctor a stat final exam now - another benefit
of having computers in the classroom... a few thoughts on evaluating
computer ranking systems.
 
1.  Of course, it's still pretty early in the season.  Some teams have
already met many of the toughest teams on their schedules, others have
significant challenges yet before them.  While this means that any
rating system is likely to still be a bit unstable, it also points out
the value of being able to assess where your team or your next opponent
stands, after accounting for their record and the quality of who they
have already played.
 
2. Although we have been using  "% of games predicted correctly" as a
measure of how well CHODR, HEAL, and now KRACH are "doing", it should
be noted that CHODR isn't really designed to optimize the prediction
of "winners".  We include the "% correct" in our results posting mainly
in response to many queries about "How well do your raitngs work?" type
questions.  The % correct data gives one easily understandable measure
of the accuracy of the forecast, but by no means the best measure.  We
would probably be more interested in a measure like a negative log of the
likelihood of the actual game score, based on a joint Poisson probability
computed from the CHODR forecasted score - but who else would really care
about that number!  Also, since CHODR is the only one of the current
rating or ranking methods which allows forecasting of actual game scores,
we'd miss the fun of comparing different methods.
 
3. To answer part of Doug's question:  Is the 81% correct figure for
CHODR and HEAL over the first two weeks particularly high or should it
get better as the season proceeds?  CHODR thinks it's unusually high.
Based on past data for typical match-ups in a season, even if we were
able to know the true scoring rates for all teams exactly, the team with
the higher scoring rate should only win about 70%-75% of the games which
are decided in regulation.  For example, if the predicted score for
Team A vs. Team B was 3.89-3.82, we would expect an "upset" nearly
50% of the time, while a 5.83-1.23 prediction should give the correct
winner much more that 90% of the time.  The current (68% correct) week
for CHODR is probably more typical of what we should expect as the season
goes along than the 80%+ values we were seeing the first two weeks.
 
4. Why do we ignore overtime wins?  CHODR itself is only concerned with
regulation time scoring rates, thus to be consistent we should only
"evaluate" the actual result at the end of regulation time.  Sure,
this sometimes let's CHODR "off the hook" when it predicts Harvard should
beat BC 4.78-2.19 and BC wins 2-1 in OT, but on the other hand CHODR
gets no credit for predicting  a 2.92-3.10 score in the Minn-Duluth vs.
Alaska-Anchorage game which ended up 3-3.  In other cases, CHODR might
get credit for a correct pick, when the predicted score was really pretty
far off the mark - example: CHODR's prediction of a 5.60-5.08 goalfest
when SLU hosted Rensselaer in what turned out to be a 2-1 SLU win.  Good
job on the margin of victory, but weak on forecasting the actual result
of the game. Or CHODR will say that Michigan should win 4.08-3.21 at
Michigan State and the final is a 3-4 MSU win - a "loss" for CHODR, but
not a very unlikely outcome for the forecasted score.
 
Well - the exam is over and all students have left so time to stop rambling.
 
Robin Lock
St. Lawrence University
[log in to unmask]
 
HOCKEY-L is for discussion of college ice hockey;  send information to
[log in to unmask], The College Hockey Information List.