LISTSERV - HOCKEY-L Archives

>>Again, YOU ARE MISSING THE POINT> Top 20 are all ABOUT the same quality,
 
> Sorry, but I can't be convinced that #20 and #1 are even close to the same
> quality.
 
That was not my point anyway.  My point (or at least this part of it)
is that upsets happen; with KRACH we can even say with what
probability we expect them.  #20 UAA has a KRACH of 159.4; #1
Wisconsin's KRACH is 863.4, more than five times as high.  (All of
this is only including games before the national tournament began,
since these are the results the committee had to work with.)  So the
gap between UAA and Wisconsin is about as big as between #44
Quinnipiac and UAA.  In either case, we would expect an upset once
every six or seven games.
 
For the game in question, UNH went in with a KRACH of 503.9, less than
three times Niagara's 175.9.  So the probability of Niagara winning
was around 26%.  Unlikely, but far from inconceivable.  (Besides
which, as people have pointed out, UNH was on something of a skid.  On
the same scale, their performance in their last 16 games would only
earn them a 250.1, while Niagara's last 16 "criterion rating" was
211.8; so in those terms, Niagara was only slightly less likely to win
than UNH.)
 
>> regardlesss.... look at the POINT. I will bow out at this point. The
>> POINT is not about niagara, but about the rating systems.
 
> I don't know what makes Tony think that I've missed the POINT.  The POINT
> is that we now have one piece of empirical data that confirms that Niagara
> should have been in the top eight of any rating system.
 
The point that we're all trying to get through to you is that THERE
ARE 931 OTHER PIECES OF DATA ALSO TO BE TAKEN INTO CONSIDERATION, in
the form of a whole season's games between Division I teams, INCLUDING
THE 27 (D1) GAMES PLAYED BY NIAGARA IN THE REGULAR SEASON.
 
[Capitals to increase the chance that Terry notices my statement of
the point, surrounded as it is by tangential paragraphs.]
 
> If anyone is going
> to try to sell a new rating system, it should reflect actual outcomes.
> Maybe I'm just being too logical about the verification and validation
> aspects of any mathematical model.
 
No rating system or selection committee can predict with complete
certainty the outcomes of games that haven't yet been played, but in
fact the whole idea of KRACH is to find a set of ratings which model
the results on which their based as accurately as possible.  That is,
as I mentioned before, the ratio of two teams' KRACH ratings gives the
proportional probability of each one winning a game between them.  If
teams A, B, and C have ratings of 900, 300 and 100, respectively, a
game between A and B is expected to be won by A 3/4 of the time and B
1/4 of the time; between B and C, the probability is 3/4 that B will
win, and 1/4 that C will win; when A plays C, they should win 9 times
out of 10.  So if A defeats B, the probability that outcome was 3/4.
If A defeats B and then A defeats C, the probability for that sequence
was 3/4 times 9/10; if A defeats B and C, and then C defeats B, the
probability is 3/4 times 9/10 times 1/4, etc.  For a given set of
games, we can take any set of ratings and multiply all the
probabilities with which they predict the actual outcomes, to get the
overall probability that exactly that set of outcomes should have
occurred.  If we calculate this overall probability for various sets
of ratings, we find that it takes on its largest value when the
ratings are those defined by KRACH.  This is known as "maximum
likelihood estimation".  Of course, this is a maximum subject to the
assumption of the model, that the odds of teams beating each other
behave in a proportional way, but the point is that predictive power
is built into the KRACH rating system.
 
Now, part of the idea behind also including record in recent games,
against TUCs, head-to-head, and vs common opponents as selection
criteria is to allow for the fact that certain teams can end the
season on hot streaks, play better against strong opposition, or match
up well against certain other teams, and to favor those teams in
selecting and seeding the tournament field.  Returning to the question
of how well the rating system predicts this year's tournament games,
of the eight games in the regionals, five of them were won by the team
winning the pairwise comparisons, with the three results that went
against the comparisons being Niagara over UNH, Michigan over Colgate,
and BC over Wisconsin.  (Those were also, not surprisingly, the three
games won by the lower seeds.)  The Bradley-Terry modified pairwise
comparisons actually did a better job of predicting the results,
getting six of the eight games right (Michigan wins the modified
comparison with Colgate).
 
But this is fairly meaningless anyway, since eight games do not
provide a statistically significant sample.  It would be a worthwhile
project (calling Craig Powers...) to calculate PWCs in the standard
and modified systems using the pre-tournament results of each season
going back to 1992 (when the 11-game regional format was introduced)
and see how many games were won by the team winning each kind of
pairwise comparison (and perhaps also the team with the higher KRACH
or the higher RPI).  The sample size (once this season's tournament is
over) would be 99 games; not great, but something to go on.  Of
course, there would be various other effects interfering, such as
higher-seeded teams (which were at least for some seasons based on the
standard PWCs) having advantages such as last line change, a day's
rest, or the opportunity to play in their own region.
 
                                          John Whelan, Cornell '91
                                                 [log in to unmask]
                                     http://www.amurgsval.org/joe/
 
HOCKEY-L is for discussion of college ice hockey;  send information to
[log in to unmask], The College Hockey Information List.