In response to Doug Garn's "open letter" to CHODR et. al. and since I've got some spare time as I proctor a stat final exam now - another benefit of having computers in the classroom... a few thoughts on evaluating computer ranking systems. 1. Of course, it's still pretty early in the season. Some teams have already met many of the toughest teams on their schedules, others have significant challenges yet before them. While this means that any rating system is likely to still be a bit unstable, it also points out the value of being able to assess where your team or your next opponent stands, after accounting for their record and the quality of who they have already played. 2. Although we have been using "% of games predicted correctly" as a measure of how well CHODR, HEAL, and now KRACH are "doing", it should be noted that CHODR isn't really designed to optimize the prediction of "winners". We include the "% correct" in our results posting mainly in response to many queries about "How well do your raitngs work?" type questions. The % correct data gives one easily understandable measure of the accuracy of the forecast, but by no means the best measure. We would probably be more interested in a measure like a negative log of the likelihood of the actual game score, based on a joint Poisson probability computed from the CHODR forecasted score - but who else would really care about that number! Also, since CHODR is the only one of the current rating or ranking methods which allows forecasting of actual game scores, we'd miss the fun of comparing different methods. 3. To answer part of Doug's question: Is the 81% correct figure for CHODR and HEAL over the first two weeks particularly high or should it get better as the season proceeds? CHODR thinks it's unusually high. Based on past data for typical match-ups in a season, even if we were able to know the true scoring rates for all teams exactly, the team with the higher scoring rate should only win about 70%-75% of the games which are decided in regulation. For example, if the predicted score for Team A vs. Team B was 3.89-3.82, we would expect an "upset" nearly 50% of the time, while a 5.83-1.23 prediction should give the correct winner much more that 90% of the time. The current (68% correct) week for CHODR is probably more typical of what we should expect as the season goes along than the 80%+ values we were seeing the first two weeks. 4. Why do we ignore overtime wins? CHODR itself is only concerned with regulation time scoring rates, thus to be consistent we should only "evaluate" the actual result at the end of regulation time. Sure, this sometimes let's CHODR "off the hook" when it predicts Harvard should beat BC 4.78-2.19 and BC wins 2-1 in OT, but on the other hand CHODR gets no credit for predicting a 2.92-3.10 score in the Minn-Duluth vs. Alaska-Anchorage game which ended up 3-3. In other cases, CHODR might get credit for a correct pick, when the predicted score was really pretty far off the mark - example: CHODR's prediction of a 5.60-5.08 goalfest when SLU hosted Rensselaer in what turned out to be a 2-1 SLU win. Good job on the margin of victory, but weak on forecasting the actual result of the game. Or CHODR will say that Michigan should win 4.08-3.21 at Michigan State and the final is a 3-4 MSU win - a "loss" for CHODR, but not a very unlikely outcome for the forecasted score. Well - the exam is over and all students have left so time to stop rambling. Robin Lock St. Lawrence University [log in to unmask] HOCKEY-L is for discussion of college ice hockey; send information to [log in to unmask], The College Hockey Information List.