HOCKEY-L Archives

- Hockey-L - The College Hockey Discussion List

Hockey-L@LISTS.MAINE.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Ken Butler <[log in to unmask]>
Reply To:
Date:
Thu, 19 Dec 2002 12:12:50 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (153 lines)
On Wed, 18 Dec 2002 10:32:24 -0600, Thomas Rowe wrote:

>Ken - I'm having problems interpreting that last table.  Can you explain it a little fuller?

Sure. This is the table in question:

>> Observed results "predicted" by system and conference
>>
>>            n KRACH CHODR   RPI  HEAL RHEAL
>> cc        66    51    53    51    53    53
>> ch        13     9     6     7     8     8
>> ec        43    35    33    34    32    30
>> he        34    28    24    27    28    28
>> ma        46    37    30    37    31    33
>> nc       166   140   130   134   127   136
>> wc        43    36    33    36    36    36
>> all      411   336   309   326   315   324

Let me just pull out the College Hockey USA ("ch") games, because there aren't
many of those. Here they are:

11/01/02 Air Force           5 at Niagara             2      ch      7:05 ET
11/02/02 Air Force           2 at Niagara             6      ch      7:05 ET
11/08/02 Bemidji State       2 at Findlay             1      ch      7:05 ET
11/09/02 Bemidji State       2 at Findlay             2  ot  ch      7:05 ET
11/15/02 Wayne State         0 at Bemidji State       2      ch      7:05 CT
11/15/02 Air Force           2 at Alabama-Huntsville  4      ch      7:05 CT
11/16/02 Wayne State         1 at Bemidji State       2  ot  ch      7:05 CT
11/16/02 Air Force           2 at Alabama-Huntsville  5      ch      7:05 CT
11/22/02 Findlay             6 at Air Force           5      ch      7:05 MT
11/22/02 Alabama-Huntsville  4 at Wayne State         5  ot  ch      7:05 ET
11/23/02 Alabama-Huntsville  5 at Wayne State         2      ch      7:05 ET
11/24/02 Findlay             2 at Air Force           6      ch     12:05 MT
11/30/02 Niagara             3 at Alabama-Huntsville  3  ot  ch      1:35 CT
12/01/02 Niagara             4 at Alabama-Huntsville  7      ch      1:05 CT
12/13/02 Bemidji State       2 at Niagara             3      ch      7:05 ET
12/14/02 Bemidji State       3 at Niagara             3  ot  ch      7:05 ET

Consider Bemidji State vs Findlay. KRACH rates Bemidji State 40th and Findlay
46th, while RPI rates Findlay 37th and Bemidji State 42nd. So if you ask these
two rating systems to pick a winner, KRACH would pick Bemidji State and RPI
would pick Findlay.

The *right* way to use this idea would be for predictions: predict the next
week's games based on the previous week's ratings. But there's nothing to stop
you applying this to the games already played -- "postdictions", if you will
(to use Larry Weintraub's word). My thinking is that you would want a rating
system to "postdict" most of the winners in games that have been played
already -- perfection is impossible (teams often split weekend series, so
you'd have one right and one wrong no matter what), but from this point of
view the system that "postdicts" the most winners is the best.

Here's what happens for all the games in this conference. I've left out the
ties, because you can't predict a winner if there isn't one!

                                                 Team rankings Pred wnr  Wnr
                                                KRACH  RPI     KRACH RPI
Air Force           5 at Niagara             2  50 45  47 44    Ni   Ni   AF
Air Force           2 at Niagara             6  50 45  47 44    Ni   Ni   Ni
Bemidji State       2 at Findlay             1  40 46  42 37    Be   Fi   Be
Wayne State         0 at Bemidji State       2  41 40  48 42    Be   Be   Be
Air Force           2 at Alabama-Huntsville  4  50 44  47 46    AH   AH   AH
Wayne State         1 at Bemidji State       2  41 40  48 42    Be   Be   Be
Air Force           2 at Alabama-Huntsville  5  50 44  47 46    AH   AH   AH
Findlay             6 at Air Force           5  46 50  37 47    Fi   Fi   Fi
Alabama-Huntsville  4 at Wayne State         5  44 41  46 48    WS   AH   WS
Alabama-Huntsville  5 at Wayne State         2  44 41  46 48    WS   AH   AH
Findlay             2 at Air Force           6  46 50  37 47    Fi   AF   AF
Niagara             4 at Alabama-Huntsville  7  45 44  44 46    AH   Ni   AH
Bemidji State       2 at Niagara             3  40 45  42 44    Be   Be   Ni

Total correct                                                    9    7

My example game was the 3rd one on the list: KRACH picked Bemidji State, RPI
picked Findlay, so for this game KRACH was right and RPI wrong. In total,
KRACH got 9 right and RPI 7. (There were 4 cases where they disagreed; in 3 of
those, KRACH was right and in the other 1 RPI was right.)

So that's what I did, for all the conferences (and the non-conference games),
plus overall.

Larry Weintraub said:

"I'm not sure that the number of correct 'postdictions' is the proper way to
evaluate a rating, since you would expect that lower rated teams will win
games.  KRACH accounts for this quantitatively.  The others implicitly.  Not
that I have a good alternative."

I agree; this idea is not so much "best" as "the best I could think of". If
two teams meet several times, you would certainly expect the lower-rated team
to win at least sometimes. The problem is in quantifying the "sometimes"; only
those rating systems based on probability models (KRACH and CHODR) give you
this. I wanted to give a sense of how well each of the systems were explaining
what has happened so far, and all that the RPI and HEAL-related systems tell
you is "I believe that this team is better than that one", so this is all I
had to work with.

Wayne T. Smith wrote:

"I'll guess that HEAL does poorly with nc games, relative to
RPI, because RPI "reaches" further due to its 2nd order/(opponents -
opponents) calculation ... missing in HEAL."

I believe so. Comparing teams in different conferences is the hard part of
this rating business. I think this also explains the improvement of RHEAL over
HEAL.

"I'm also quite surprised that all of the "prediction" rates are so high,
  since we're probably not yet to mid-season."

There's a potential bias in "postdicting", because the games we're picking
winners for are the same ones used to construct the ratings -- there *should*
be a reasonable agreement. I'd guess that the success rates for genuine
predictions (predicting next weekend's games using these ratings) would be
lower.

"I'm quite surprised that RPI is doing so well!"

Me too!

Charlie Shub wrote (among other things):

"are any of these predictions, or the rankings themselves for that
matter, statistically significant at some level or are we just seeing
minor random fluctuations that aren't significant?"

Short answer is "don't know". It's not obvious to me how to compare the
predictions -- a standard chi-square test wouldn't work because each system is
predicting the same games. I think a useful approach would be to concentrate
on the games where two systems disagree, and see whether one system
outperforms the other by more than chance on those games. (This is why I
mentioned above the 4 College Hockey USA games where KRACH and RPI disagreed;
this is too small a number of games to prove anything, but other conferences
might give something useful.)

Some sort of simulation might also be helpful, to assess the amount of
variability in the number of successful "postdictions" given an overall
success rate, but it's not obvious to me how that would go.

As to the rankings themselves: for KRACH, there is already easily enough
evidence to reject a hypothesis that all teams are equally good, though even a
whole season's worth of data won't say much statistically about the relative
strengths of teams in different conferences.

OK, guess I'd better get back to my "real" work now :-)

Cheers,
Ken.

--
Ken Butler
At home in Canterbury, England

ATOM RSS1 RSS2