HOCKEY-L Archives

- Hockey-L - The College Hockey Discussion List

Hockey-L@LISTS.MAINE.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
College Hockey discussion list <[log in to unmask]>
Subject:
From:
Ken Butler <[log in to unmask]>
Date:
Sun, 4 Jul 1993 14:27:15 PDT
Reply-To:
Ken Butler <[log in to unmask]>
Parts/Attachments:
text/plain (139 lines)
I'm glad that my ratings posting has excited some comment!
 
In a way, I'm not surprised that other people have tried these things.
I already knew there was a proliferation of ratings for college football
(which is in dire need of *some* reasonable rating system, IMHO).
I'll be doing some digging around in the archives to see what TCHCR,
CHODR and RPICH are all about, and if I can make any sensible
comparisons, I'll post them to the list.
 
First, a little about me, and then about the rating system.
 
I'm currently working on my PhD at Simon Fraser University, in Statistics,
and more precisely, in the field of Paired Comparisons (of which hockey
games are but one example -- "can you tell butter from margarine" being
another). My MSc (so-called in Canada) concerned applications of the
rating method proposed here, and some other methods which use the actual
game score in the ratings as well. As you see, I'm trying to frighten
people with my credentials! I've applied various rating methods to
a variety of sports (eg. tennis, soccer, basketball, Australian
football(!)), though I'm not confident enough to gamble my life savings
on my predictions. (Decide for yourself whether that reflects on the
quality of the ratings or my inherent cowardice!).
 
Mike Machnik suggests that I need a name for my rating system. How about
KRACH (pronounced "crack", not "crash"!), which stands for "Ken's Ratings
for American College Hockey" -- after all, I can't possibly buck the
trend and have a name with something other than five letters! BTW, the
German word Krach can mean "violent disagreement" -- I hope we can keep
those to *on* the ice.
 
Here comes the statistical description of my method. My feeling was that
I wanted to base the ratings purely on wins and losses because the
teams' first desire is to win, not to score lots of goals. So I turned
to the idea of modelling the probability that one team would beat another.
As Tim Danzer points out, if you allow the ratings to be any values, you
have to work a bit at turning them into probabilities; the statistician's
favourite device for doing this is the "logistic transformation". For a
rating difference d, say, this gives a probability 1/(1+exp(-d)) of the
first team winning. (You can check that this works out OK by seeing that
two teams at the same rating will have 50-50 chances if they meet,
and two teams that are far apart in rating will have probabilities near
0 or 1).
Having decided that this is to be the relationship, we need to estimate
the ratings for each team. I did this by "maximum likelihood": the idea
is that you choose the ratings so that the overall probability of
getting the results you actually did get is as large as possible.
After all, there's no point in choosing ratings that mean that the results
that actually happenedwere very *unlikely*! To get the likelihood function,
you need to assume that each game is independent of the others; that is,
no matter what happened in the teams' previous games, the probability
that the first team will defeat the second when they meet is still what
the logistic transformation predicts it will be. (Even, say, if the first
team had just defeated Maine and the players were still on a big high).
As tends to be the way in real-life, I think the independence assumption
here is questionable but not too far off the mark.
The likelihood is a product of terms like 1/(1+exp(-rating(i)+rating(j))),
where teams numbered i and j are meeting in the match you happen to be
looking at, and team i won. (I count ties as half a win and half a loss,
so in the likelihood there are two terms in the product: the square root
of the one above, and the square root of it with i and j exchanged.)
 
To maximize this ugly product (simultaneously for *all* the ratings)
is incredibly ugly, but maximizing its logarithm is mathematically
equivalent, and it turns the products into sums. So you actually
maximize the log-likelihood, which you can do using good ol' multivariate
calculus (differentiate with respect to each rating, set each derivative
equal to 0, and solve). But these equations are non-linear, so you'll need
to drag in your favourite non-linear function maximizer to solve the
problem. For small problems, the best method is the multivariate version
of Newton-Raphson, but that needs you to calculate and store the second
derivatives of the log-likelihood as well; my favourite general method
is called Conjugate Gradient, which doesn't use second derivatives at
all. (For those interested: descriptions of and programs for these are
in "Numerical Recipes" by Press, Flannery, Teukolsky and Vetterling,
published by Cambridge University Press). By the way, there is some
theory that indicates that for this likelihood, there is only one
maximum (and no minima or saddlepoints), so the answer you get is *the*
answer. I find that starting each team off at rating 10 is good enough
to get the answer in 10 or so conjugate-gradient iterations, less with
Newton's method.
 
Since the only way the ratings are used is via their differences, there is
one extra "degree of freedom" here, which I resolved by having the ratings
average out to 10. There is one other possible problem (though it didn't
happen here): if a team wins (or loses) all its games, the maximum
likelihood estimate of its rating is plus (or minus) infinity, because you
can make the term in the likelihood 1, the biggest possible, for all the
games involving that team. I get around that by introducing one fictitious
game for each team, a tie against an imaginary team with rating 10;
this biases the ratings towards 10 (but only slightly if the teams have
played a reasonable number of games, as here), but you get a finite
rating for each team, and predicted probabilities for future games that
are not 1! Also, the rank ordering of the teams is not changed by
introducing these fictitious games.
 
A couple of ideas I didn't implement here (I wanted to see if there was
any interest first!): a home-ice advantage can be included, and estimated,
by working out the probabilities for the likelihood not just from the
rating difference, but from the rating difference plus the current
guess at the home-ice advantage. For example, if a team rated 9.5 plays
at home against a team rated 10, and the home-ice advantage is 0.7,
the probability that the home team wins would be based not on 9.5-10=-0.5,
but 9.5-10+0.7=0.2. This of course changes all the ratings, because
everything has to be estimated, simultaneously, again.  I've found that
a single home-ice number has been good enough in most sports I've looked
at, but if you have enough data, you can estimate a separate home-ice
advantage (and road-ice disadvantage) for each team. This would give
more accurate results for a team like last year's Quebec in the NHL, who
had a better road record than most teams' home record! But, as a
statistician, I would test to see whether such a detailed model was
"significantly" better than a simpler one.
 
I've forgotten what the other thing was I didn't implement!
 
Some other notes: it's also possible to weight different games differently
(maybe this was the other thing). To produce ratings that "describe"
what's happened so far, I'd prefer to treat all the games equally, but for
prediction, it may well be better to weight recent games more heavily.
Unfortunately, there's no nice way to estimate *how much* more heavily
you should weight them. Unless anybody else has found one, of course.
Doing predictions this way yields a probability rather than a goal-spread
for each game, which is a bit weird, but I think you can get used to it.
Mike Machnik points out that Ala-Huntsville and Mankato State are not
really Division I schools (fair enough); I included them simply on the
basis of the number of games they had played, and excluded Army on the same
basis. Provided that the other teams play consistenly against them, though,
it makes little difference whether they are included or not, but I would
suspect that some teams treat their games against UAH and MSU more
seriously than others, in which case they should be excluded.
 
Well, that's KRACH for you. I'd be  particularly interested in
hearing from the creators of the other rating systems (or anyone else
with an opinion, come to that). As I say, if I discover anything
in comparing KRACH with TCHCR, CHODR and RPICH, I'll let the list know.
 
--
Ken Butler
[log in to unmask]

ATOM RSS1 RSS2