ADVERTISEMENT

New Ranking Method

Clong83a

I.T.S. Senior
Nov 15, 2014
1,113
1,022
113
Time for a lunch break post.

Yesterday I was thinking about the various ranking systems out there, such as KenPom, Sagarin, RPI, "BPI", etc. These are attempts to rank teams based on some sort of index, which is heuristically determined from wins/losses, scoring margins, strength of schedules, etc, etc. They all generally keep their special formula secret, but they are all trying to do the same thing. I never liked them. They all have flaws in some way, and I always have the sneaking suspicion that problems and biases are built in to the formulas somehow or another, even if unintentionally and unwillingly.

But supposing you have a ratings index that goes from 1-100, and you give a default 5 point home game advantage to any team. There exists, uncategorically, a "best" way to give each NCAA team a score such that their rank matches actual outcomes in the most accurate way possible. However, this is extremely difficult to achieve in practice. Even if you only allow integer numbers for the index, that gives you 100 different possibilities for each team (1-100), for a total number of 100^350 possibilities to be explored. You can limit the possibilities somewhat, but it doesn't really help. For example, you can restrict that Kansas is probably above 75, say. But even if you could feasibly constrain every team to a 25-point range, that is still 25^350 possibilities! Even if you already knew, a priori, each teams 'rank' within +/- 1, you'd still have to compute and search 3^350 possible combinations of ranks to ensure you got the "best" one! God help you if you want to compute down to a quarter of a point or something. You will be computing away until the heat death of the universe, even with reasonable constraints put in. So the simple phenomenological formulas that are out there make sense in that context.

So my idea was to use a modern optimization scheme to find the right minimum in the data set without computing all 100^350 possibilities. I use surrogate management framework (kriging functions), coupled with a mesh-adaptive-direct-search (MADS) scheme to efficiently and reliably determine a ranking set that gives the minimal error. It is 100% guaranteed to find a minimum in the data set. It makes no promises on how fast it does this, and you could theoretically compute away until the heat death of the universe anyways, but it has been very reliably used for a number of other complex applications. For reference, the processes I used are outlined pretty nicely here:

http://epubs.siam.org/doi/abs/10.1137/040603371
http://ntrs.nasa.gov/search.jsp?R=19980046640

I have some experience with these methods myself, and I've used them successfully in the past:

http://link.springer.com/article/10.1007/s00466-013-0967-z

My wife was out of town last night, so when I got home from work I decided to test it out. I created a database of all AAC games this season. There were 99 of them. I stored the winner, the loser, the score differential, and the home team. I used a scoring system from 0-50, 50 being the best. I also assume each team has its own unique home court advantage, which I assumed could range from (-5 to 15). This gave me 22 variables for 11 teams. I allowed each variable to increment by as little as 0.0625, giving me 800^11*320^11 possible combinations to test. To give you an idea of the scale of the problem, even our little 11-team conference would require an enormous effort on a massively parallel supercomputer to find the correct values in my lifetime.

I defined the 'error' based on two parameters. The first is simply the percentage of games predicted incorrectly. The second is more complicated, but is based on the predicted score differential versus actual score differential, as shown:

err2 = sqrt( SUM ((predicted differential - actual differential) ^ 2) / Number of games )

This is different from simply using the average error in point differential. It skews the value more towards the worst error. Actually, in retrospect, this is probably a mistake, as the worst errors are likely outliers and we should not punish our rankings for not adequately predicting outliers. I will probably change this to a simple average, or maybe even a median value. The total error is then just the sum of the two errors described above. The optimization scheme searches through combinations of different rankings and stops when it finds the lowest possible error value.

I set up the problem and ran it for a few hours last night, using ONLY AAC data. It did not finish converging, but it got close. I found a bug with the mesh refinement, where for some reason it only occasionally refined the search space correctly, which is why some numbers are whole integers. So Temple has a power rank of 40 and not, say, 37.9, because the algorithm early on settled on the idea that Temple was closer to a 40 than a 35 (initial spacing was in increments of 5). Then it never refined the search space for this team, which is probably the only reason they are #1. If I ran it again Tulsa, UCONN, and Temple would get bounced around more and the whole top 5 could shake up quite a bit with a second run to work out the kinks. It found this near-minimum in only 600 computations, which is pretty good compared to 800^11*320^11...

TEAM POW H. ADV
#1 Temple 40 -1
#2 SMU 39.06 5
#3 Cincy 38.75 0.88
#4 Tulsa 35 1.13
#5 UCONN 35 5
#6 Memph 30 5
#7 ECU 25.31 5
#8 Hou 21.25 -2
#9 Tulane 20.63 1.25
#10 UCF 20 5
#11 USF 16.25 10.88

The way to read it is that if Temple (40) plays Memphis (30) on a neutral court, it predicts a 10 point victory for Temple. If they play at Memphis, it predicts a 5 point Temple win after accounting for the Memphis home court advantage. At Temple, and it predicts a 9 point Temple win, since Temple apparently doesn't really have any advantage to playing at home.

It predicted the winner in 86/99 AAC games correctly, but it missed both of our Temple victories. As mentioned, the optimization tends to stress out and try to minimize the point differential error in the biggest outliers, rather than being satisfied with a large number of games with a very low error in score differential. Changing this approach might also change the results some.

I'd like to patch it up and run it again tonight and see what it does. But the wife will return this evening, so we will see if I have time. It might be a weekend project.

If this is something that people find interesting and respond to, I might try to do a weekly computation of the full NCAA next season and post it online somewhere. If this post gets ignored and/or I get dismissed as just some crazy guy, then I probably won't bother. The above ranking has its flaws (Obviously Temple should not be #1...), but I'm actually pretty excited at the results given that it is only a half-complete first attempt that did not converge with a notable bug affecting the results. And yet, it still predicts the winner 87% of the time! Which is really pretty good as far as these types of things go. When it is converged and working properly it should, by its very nature, outperform every heuristic method that exists, which is an exciting idea.

So now you all know what a bored mathematician does with a free evening. I might have had a beer or two, too, which could have led to that d*mned bug that screwed up the results. Oh well, and thanks for reading this far. No lunch, back to work.
 
Also, I just noticed that the #1 team, Temple, is predicted to lose to the #2 team SMU either at home or away. They are predicted to win only on a neutral court. Most peculiar.
 
Yeah, I think you're overweighting the outliers. Looks like the mesh size also crapped out on most of the home-court advantage numbers. Given the research on factors affecting home court advantage, I would expect the true values to be very similar for every team.

I think it would also be a better test to run the model against N+1 results. So if you input all AAC games before March 1, how well does it predict the results after that date? That way you're not testing the model against results that were used to build the model.

Good stuff. I'd be interested in seeing more.
 
Thanks, ctt, your opinion certainly means something to me.

For home court advantage, I had the idea that while it is probably pretty uniform overall, there may be a few interesting outliers. Kentucky/Duke might have a significantly different home court advantage than that of Rice/Jackson State. But if that is really true, and/or how much it affects the main results, I have no idea. It should likely be more uniform than what is shown for the AAC, I agree, but I was thinking ahead to larger and more disparate samples.

The self-selected bias of comparing it to the same data I used to build the model is certainly an issue, and I will incorporate a test like you suggest into my weekend plans. A fun, if not hugely instructive, test will be if I can get it working properly before the championship game on Sunday and try to pick the winner and score differential. :)
 
Originally posted by nevadanatural:
Man, you need a hobby.
Lol. I ski 2-3 times a week, homebrew my own beer, and otherwise tinker with computers endlessly. :)
 
Nice, back before I had a business and 6 kids a buddy and I were working on something similar. One main thing to note is that you need something adaptive that re-weights factors based on input. The key in working point differential is a system that tracks total game "control" from start to finish. You need to the score at each time out to understand who had control of the game. A 15 point win can be a tie game that got out of hand in the last 4 minutes due to missed 3s and fouls OR it can be a game that was a 30+ point game until the benches cleared and a late surge made it seem closer than it was.



ash =o)
 
Back when TU played at the old Civic Center, you also needed to know if there had been a hockey game the night before and the floor was cold and sweating.
 
You use your drinking time much more wisely than I do.
 
Originally posted by ashVID:
Nice, back before I had a business and 6 kids a buddy and I were working on something similar. One main thing to note is that you need something adaptive that re-weights factors based on input. The key in working point differential is a system that tracks total game "control" from start to finish. You need to the score at each time out to understand who had control of the game. A 15 point win can be a tie game that got out of hand in the last 4 minutes due to missed 3s and fouls OR it can be a game that was a 30+ point game until the benches cleared and a late surge made it seem closer than it was.



ash =o)
That's a great, great point!
I don't think anyone currently accounts for that do they?

This post was edited on 3/13 4:18 PM by Weatherdemon
 
Originally posted by ashVID:
Nice, back before I had a business and 6 kids a buddy and I were working on something similar. One main thing to note is that you need something adaptive that re-weights factors based on input. The key in working point differential is a system that tracks total game "control" from start to finish. You need to the score at each time out to understand who had control of the game. A 15 point win can be a tie game that got out of hand in the last 4 minutes due to missed 3s and fouls OR it can be a game that was a 30+ point game until the benches cleared and a late surge made it seem closer than it was.



ash =o)
A time-integrated approach to compute an alternate 'game-balanced' point differential (that might not even be the actual winner) is a really good idea. Thanks for the input.

The main issue I can think of with it is gathering that data in a reliable, non-tedious way. Halftime scores would be easy enough to get and incorporate automatically, but that's still pretty "coarse". I am not sure how I would get the designated timeout scores in an automated way, and even for just the AAC, it would be an awful lot of sifting through game reports by hand to try and incorporate it. It is definitely something to think about, though...
 
No models currently use the adaptive point differential BUT the data is there. The math is not overly complex but since timeouts are not always at the same time you'd have to account for that for consistency. Actually, the data is there to pull a score at a specific time now, time out or not.

You'd have to write a script or get ESPN, etc. to put it in an API but I'm convinced that it's the only way to truly factor in point differential. I think this is REALLY valuable in predicting plus it's great information for the committee.




ash =o)
 
You might able to write a script to browse box scores and look for the first entry that has 15:XX, 11:XX, 7:XX, and 3:XX in each half and pull the score.
 
Originally posted by Weatherdemon:
You might able to write a script to browse box scores and look for the first entry that has 15:XX, 11:XX, 7:XX, and 3:XX in each half and pull the score.
Yup. I started with the AAC only, so I haven't yet automated very much in the way of downloading/formatting box scores from online databases. I don't know what type of data is readily available, and I did not assume that detailed score history would be there. But if it is, then since I am going to be writing scripts to gather data anyways, doing something like this shouldn't actually be that hard to add. If I expand this project for a full NCAA run next fall, I will try to include something like this.

I think the way it would work would be that instead of taking the final score differential as the "actual" point differential for error analysis, I would instead just use the time-integrated and normalized point differential as the "actual" data point for comparison, and just ignore the final score except for determining a winner/loser.

That still leaves the question of what to do in the event that there is a discrepancy between the winner and the point spread. The optimization scheme will either get the winner right and miss the point spread, or it might nail the point spread but pick the wrong team. I think for cases like this it is a simple matter of weighting the error to prioritize minimizing the point spread error. This is easy to do. I curently have:

err = err1+err2,

where err1 is the percentage of games picked incorrectly, and err2 is my point spread error. If I change it something else, like:

err = err1 + 2*err2,

or:

err = err1*err2*err2,

then the point spread error plays a much larger role, and in cases like this the scheme will naturally err on the side of getting the point spread correct. Leaving out the "miss" percentage (err1) entirely is also an option since picking the right point spreads will inevitably lead to a good percentage. It's just a super easy thing to define.

I hope I can get back to this over the weekend, it is a fun and exciting little project.
 
NICE! I think that will work as a base formula. Maybe you can spend the summer doing a full regression analysis to properly weight everything :)




ash =o)
 
ADVERTISEMENT

Latest posts

ADVERTISEMENT