ADVERTISEMENT

RPI is dead

They chose the only solution worse than just keeping RPI
Just curious why you say so? Asking as someone who doesn't know more about it other than what is said in the article. I am a bit torn on using margin of victory in these things, because it rewards high-scoring run-n-gun offenses over more deliberate "Princeton" style offenses with gritty defenses. But they did cap it at ten points, which seems like a reasonable compromise.
 
Just curious why you say so? Asking as someone who doesn't know more about it other than what is said in the article. I am a bit torn on using margin of victory in these things, because it rewards high-scoring run-n-gun offenses over more deliberate "Princeton" style offenses with gritty defenses. But they did cap it at ten points, which seems like a reasonable compromise.
The cap at 10 points won’t do those games any justice. I didn’t see how it would be better. I believe the only people it helps are those in the ACC, PAC 12, Big 10 and Big 12
 
It's a black box. So far, it doesn't sound like they have any intention of releasing the formula. In fact, they're implying that it's not even a static formula, but rather a machine learning algorithm with proprietary code. This has a number of consequences.

First is that we have to trust that both their input and their code is correct. The BCS had the same problem with their computer rankings and there was at least one well-publicized instance in 2010 of an error in the Colley Matrix that caused the rankings to be incorrect. Say what you will about the RPI, but there are dozens of sites calculating those rankings every day. Any discrepancies are caught almost immediately. Instead this is what we get:

"It's not really a 'formula,' so much as it's highly sophisticated and involves machine-learning is not easily digestible. It is not like the RPI, to consider this as a formula. This is not that. This is very contemporary, forward-thinking and involves machine-learning and artificial intelligence."

Second is that we don't know how any of the factors are weighed within their secret sauce and whether they make any mathematical sense. The arbritary choice of a 10-point cutoff is a good example. What's the math behind that number? The burden of proof should be on them to show that this is an improvement. At the very least, they could provide retroactive rankings so we could see how this new system compares to the well-known publicly available metrics such as KenPom, Sagarin, etc. According to Dan Gavitt, the "committee felt like there was nothing productive by going back and comparing it."

Another problem is that it's a predictive metric. This is more of a philosophical question, but I firmly believe that a team should be judged by whether they won games this year, not whether they will win games in the tournament. Most of the time, these values converge, but not always. In predictive metrics, a team can theoretically lose every game on their schedule by 1 point and finish within the top 50. The results of games should matter. It should matter whether a team wins or loses.

So is it better than RPI? Probably is. But we knew the weaknesses of RPI. We knew the numbers were correct. We knew how to weigh them against other factors. This is just a number handed down from the NCAA gods that we're supposed to trust because Google helped them make it? How will the committee handle that? How should they? I have no idea...
 
I thought 10 was too low. I would have made it 20.
The whole thing is just to drive more NCAABB conversation and to benefit those they want it to benefit.
 
I wonder if they're running some kind of statistical probability simulations... that doesn't seem fair to me... takes out the possibility of accounting for future injuries or just improvement in team play as the season progresses. I mean, look at a team like our 2012 team... lost a ton of games to start the season, then really gelled and dominated down the stretch in conference play... how does a computer account for that? It can't.
 
It's a black box. So far, it doesn't sound like they have any intention of releasing the formula. In fact, they're implying that it's not even a static formula, but rather a machine learning algorithm with proprietary code. This has a number of consequences.

First is that we have to trust that both their input and their code is correct. The BCS had the same problem with their computer rankings and there was at least one well-publicized instance in 2010 of an error in the Colley Matrix that caused the rankings to be incorrect. Say what you will about the RPI, but there are dozens of sites calculating those rankings every day. Any discrepancies are caught almost immediately. Instead this is what we get:

"It's not really a 'formula,' so much as it's highly sophisticated and involves machine-learning is not easily digestible. It is not like the RPI, to consider this as a formula. This is not that. This is very contemporary, forward-thinking and involves machine-learning and artificial intelligence."

Second is that we don't know how any of the factors are weighed within their secret sauce and whether they make any mathematical sense. The arbritary choice of a 10-point cutoff is a good example. What's the math behind that number? The burden of proof should be on them to show that this is an improvement. At the very least, they could provide retroactive rankings so we could see how this new system compares to the well-known publicly available metrics such as KenPom, Sagarin, etc. According to Dan Gavitt, the "committee felt like there was nothing productive by going back and comparing it."

Another problem is that it's a predictive metric. This is more of a philosophical question, but I firmly believe that a team should be judged by whether they won games this year, not whether they will win games in the tournament. Most of the time, these values converge, but not always. In predictive metrics, a team can theoretically lose every game on their schedule by 1 point and finish within the top 50. The results of games should matter. It should matter whether a team wins or loses.

So is it better than RPI? Probably is. But we knew the weaknesses of RPI. We knew the numbers were correct. We knew how to weigh them against other factors. This is just a number handed down from the NCAA gods that we're supposed to trust because Google helped them make it? How will the committee handle that? How should they? I have no idea...
Thanks. I missed the part where they were using machine learning. I know an awful lot of machine learning researchers... I am sure they will be shocked to hear that it is being used for this purpose. Using machine learning to come up with proper weights for a static and deterministic formula would be one thing... But this just screams for opportunities to have a thumb on the scales. Not to mention that it is not really deterministic in the classical sense. Input one new game of minor consequence into the 'training set', and the entire top fifteen could get resorted completely. It should theoretically get better as the training set improves, but man... Thanks for the info. "Machine Learning" is all I needed to hear to know it was garbage.
 
  • net offensive and defensive efficiency This looks like a big fudge factor area........
Makes the early games even more important now since they are equal to the last 10.
 
  • Like
Reactions: loca2874
The biggest problem with ML is there is no science behind it, just prediction. Correlating results with other indices that are more explainable will help people interpret the meaning, or validity, of results. But there is certainly a risk of capitalizing on chance and some weird individual results, despite any cross validation strategy they used.
 
The other thing is that, depending on the type of ML technique they used, it might be impossible to 'release' the algorithm in any meaningful way.
 
Also, doesn’t ML need 100,000 of data sets of training data before it is usable?

I agree with everyone else, RPI wasn’t the best, but I’d much rather have it than whatever this is.

Seems like some people sold the NCAA on a bunch of buzzwords they don’t understand.
I sit on a committee that awards funding to various scientific computing proposals. Can't tell you how many people just add, "And we'll do machine learning on xyz!" because it is buzzword-worthy. Don't get me wrong, it can be incredibly useful (and we've funded many projects that use it). But in the worst cases, it comes across as "We'll let the computer figure out this very difficult problem that we have no original ideas about and take its solution as gospel for no reason".

I find that if you replace the phrase "machine learning" with "a fancy optimization scheme" (which is essentially what it is when it is most useful), it helps to filter out good and bad uses. In this case, the question is: "Optimizing for what? And what is your control test? That is, how can you prove/demonstrate that your algorithm is really doing that better than other metrics (such as RPI, Pomeroy, etc), or even a run-of-the mill optimization scheme like OrthoMADS?" I've always thought running an optimization scheme to try and sort the strength of sports teams would yield interesting results, but it should first be done by conventional methods. It would have to analyze a 325-dimensioanl space and would be very computationally expensive, but I'm pretty sure the guys at the NCAA can afford some decent computer time.

The fact that they are using Pomeroy rankings as input to this thing tells me they have no idea what they are doing. If their algorithm worked well, they wouldn't need that info. They should be comparing AGAINST that to show they get better results than Pomeroy, etc.
 
You do not need 100,000 cases necessarily. Depends on how many predictor variables and flavor of ML.

Need to get Chito to jump into this discussion.

It would be more interesting if the used statistical techniques that might predict a tiny bit worse, but led to better understanding of WHY.
 
  • Like
Reactions: cmullinsTU
I read a good bit on this, some positives, some negatives. If the NCAA does it right, it should be an overall positive. Many people are so hesitant because it includes a lot of "trust us" statements.

Obviously RPI had issues. This new model tries to take everything relevant to the better prediction models into account:
1) "Team Value Index" --- seems a lot like RPI
2) Team Efficiency -- points scored/allowed per 100 possessions
3) Wins -- runs the risk of double counting as Team Value Index includes this metric, but is the "just win baby" portion of the formula apperently
4) Adjusted Winning Percentage -- Basically rewards for winning at home, punished for losing at home
5) Scoring Margin -- run up the score and get more points, capped at 10. Interesting relationship with efficiency. A slow and steady team like UVA or Princeton is unlikely to run the score up much. Also, teams that go into desperation mode and start fouling may see increased loss margins. "They" say they evaluated the margins, and 10 was statistically significant but not encouraging others to crush people's souls.

Overall, I think it's a neat new tool. But since it has major implications I would really have liked to see what the new system WOULD HAVE said about the last 3 years worth games. Better yet, hand it off to the stat nerds. Do all MidMajor's get punished for some reason while anyone who plays 5 top ten teams (even if they lose all the games) gets a big bump? Is there a scheduling secret that helps climb the rankings? Is that nun for Layola factored in properly?

Why is the formula proprietary? When there were 5 potential black boxes and humans contributing to a ranking, it made a little more sense (FBS). But this is more like a gymnastic competition - to build the best routine I need to know how much the difficulty score matters, what elements weigh the most, and how many points do I get if I stick the landing, are there certain elements I shouldn't risk doing (I feel like my summer Olympics watching is really paying off)?

The machine learning portion is interesting. You have many data sets already, presumably the machine utilized those to tweak whatever predictive formula was the starting point. While basketball clearly advances year by year, making some stats more relevant to outcome, I wouldn't think there is a strong need to "learn" midseason. Why not just state that the formula will be updated to be more predictive at the end of each season?

Better yet, just show us how much better NET is at predicting how good a team is than RPI. Show us the trial runs. We want to believe.

CBS has a good write up on it:
https://www.cbssports.com/college-b...long-overdue-overhaul-on-an-outdated-process/

So does this Ben Snider from SB Nation:
https://www.anonymouseagle.com/2018...ll-selection-committee-metric-ranking-rpi-net

NCAA story:
https://www.ncaa.com/news/basketbal...-mens-basketball-committee-adopts-new-ranking
 
The rpi and the bcs systems have the same problem. They favor teams from the big 5. Yes we know who the top teams in each p5 conference are, but they tend to overrate the next level.

A 5th place team in a p5 is given the same level as a team leading a g5 conference.
 
Machine Learning- almost sounds as though they're inputting old values and giving players scores like you see in Madden, FIFA, all the EA stuff...and then they'll run a simulation, and voila, you have your field of 64/68 whatever it is next.
 
ADVERTISEMENT

Latest posts

ADVERTISEMENT