Alternative rating system to goratings.org

gennan · December 13, 2021, 9:20pm

I feel you’re jumping to conclusions a bit quickly here.
Silence could mean something else than trusting your ratings better. For example: I’ve been silent for a month, but that’s because I was waiting for some response to my question.

xiaodaiboy · December 13, 2021, 9:44pm

A couple of Japanese players do stand out but only Vs other Japanese player. I think a true stand out is Iyama this year in Nongshim. But a true standout in the world stage should be able to win a title or have other indicators. He made the LG cup final once.

xiaodaiboy · December 13, 2021, 9:45pm

Based on my opinion rating Japanese so high is evidence. But that’s just an opinion. If u don’t feel that is strong enough then that’s ur opinion

Vsotvep · December 13, 2021, 9:54pm

It’s a racist opinion, please just stop repeating your idiotic opinion and come with facts instead.

Vsotvep · December 13, 2021, 10:06pm

Let me also say, your obsession with winning titles is also completely irrational. Has Byun Sangil won any major titles? No, but he’s #4 on your ranking. Preposterous! What about your #5, Ding Hao? Nothing yet! How can these two be ranked higher than your #6 Ke Jie, who has won 15 titles, including international ones?! That ought to be impossible by your logic!

Of course it’s not impossible, since your logic is flawed. Titles aren’t important: winning matches is what is important.

(although I do believe your system may be ranking Ke Jie too low, but we’ll see)

Uberdude · December 13, 2021, 10:09pm

I’m going to make a rating system for OGS forum threads and keep tweaking it until this one comes last.

jlt · December 13, 2021, 10:11pm

Depends on what you mean by standout. Being around #10 - 15 is already standing out in my opinion, but doesn’t mean you necessarily win an international title.

Iyama did win the Asian TV cup though

https://senseis.xmp.net/?AsianTVCup

xiaodaiboy · December 13, 2021, 10:13pm

Lol. Like, if I say Chinese soccer has not performed well, does that make it racist?

I pointed to so many facts, Japanese players have not won a world titile in 15 years (or so), their 4 win in Nongshim cup this year was their first ever! Other metrics, they’ve come dead last for 13 years (or so in Nongshim), they’ve won the Agon China-Japan challenge only 2 twice out of 15 (or so attempts) i recent past. All points to Japanese players being a lot weaker than Korea and China. HOw does pointing out facts make it racist?

If I say European/NA go is a lot weaker than these East Asian nations, does that make me racist too? Fact is just fact.

xiaodaiboy · December 13, 2021, 10:22pm

In machine learning parlance, this measure is called the “accuracy”. I think it’s not a great measure for 120 games, because flukes can happen, e.g. a weaker player might have 10% chance of winning so I am not sure if 120 games spread over so many players will average the chances.

The measure I would suggest is called likelihood or its equivalent log-loss.

Let P = probability that black is predicted to Win and let Q = 1-P

logloss for one game = - (P*log(P) + (1-P)*(log(1-P))

Then just sum the above over all games. Also, to play 120 games, all 3 ratings might have moved quite a bit in between some of these games, so the best way is to use the rating and predict just the next game, then update the rating, and predict another game. Record the log loss for every game using the rating just prior to the game.

Also note that my ratings take into account 7.5 komi white advantage which is about 30 points in ELO. I think goratings doesn’t and not sure about manumanu.

Vsotvep · December 13, 2021, 10:23pm

It’s because you throw the entire country on one big heap, and decide that Japanese players are weaker because they haven’t gotten many titles, thereby ignoring that titles are not won by the average performance of a country (in which case you would have had a point), but by individuals.

You discard Iyama Yuta and Ichiriki Ryo as players that have a spot in the top X, not for their actual winning rates against similarly ranked opponents, but instead because they are Japanese: you think they are weak because of their race (or perhaps the system that they play in), not because of their performance.

Let’s compare it to soccer, since you bring it up: if a player is from a country that hasn’t won any major titles, then by your logic, this cannot be a good player, because their country doesn’t win titles. By that logic, there are no good players from Norway, for example, or from Egypt. Yet, Erling Haaland has been bought for €110 million, same price for Mohamed Salah. The top 10 most expensive soccer players lists 4 people from England, yet they haven’t won a major title since 1966.

xiaodaiboy · December 13, 2021, 10:26pm

Iyama’s 4 game haul in Nonghsim is his (+ Ichiriki, Shibano, + Hsu’s first ever). Do you know how many 4+ game hauls China and Korea have had in the years since they played in Nongshim? I think Iyama and Ichirkik are ok in the top 20-30 or so. BUt no higher. Definitely not top 10 based on their performance.

jlt · December 13, 2021, 10:26pm

Chinese athetes don’t do well in sprint, so this athlete doesn’t exist.

Vsotvep · December 13, 2021, 10:28pm

Sorry, but this doesn’t make sense in the context of finding the best ranking system. I would need a probability for winning chance for each of the rating systems to even be able to make this compute at all. How do you see this being a measure for testing whether your system has a better predictability than the other two systems?

I’m trying to decide which system is better at predicting outcomes, not trying to find out what the change in rating of a player is after a certain match result.

xiaodaiboy · December 13, 2021, 10:33pm

I can’t think of a reason why this is good science. Can you tell a system is good or not by focusing on a few players?

Why not just compute the log-loss on all players? Even if the system ranked the players the same, the rating can be still be assess using log-loss measures.

Vsotvep · December 13, 2021, 10:34pm

Winning titles is not the only metric for performance. If Iyama Yuta wins 50% of the matches against opponents within the top 20, then he belongs in the top 20, regardless of whether he tends to lose in major tournaments.

And again, why are you criticising Japanese players for their weak performances, but not players from other countries? Why do you have no problem with, for example, Dang Yifei being rated highly, even though ~~he also hasn’t won any major titles~~ has won one title and performs similarly to Iyama Yuta?

Why are you making this about race and not about the actual players?

xiaodaiboy · December 13, 2021, 10:35pm

He won the LG Cup and came 2nd in BC Card cup. Both international tournaments.

I think Iyama and Dang are quite close in the ratings. ONly about 29 point difference. Could be overcome in a few games.

I think my system is fair since it reflects well on Iyama due to his recent good performance (5 titles in Japan, and 4 wins in Nongshim, and 2 wins in Jia league).

xiaodaiboy · December 13, 2021, 10:40pm

That’s exactly what logloss does and my suggest way is better at doing that. Perhaps you didn’t understand the method

my suggested method doesn’t do this.

Man, if that’s the level of “scientific” enquiry, I can see why my systems been attacked so bad…

martin3141 · December 13, 2021, 10:48pm

“Your” system has been used in chess for many years … What did you contribute, except for your opinion on japanese players?

xiaodaiboy · December 13, 2021, 11:00pm

wrong. Chess uses ELO but my system is just a logistic regression. Nice attack. I can see that it’s trying various different angles to try and attack me. Also, the attack just comes without proper fact checking which is what i expected cos it’s just an attack. doesn’t have to be based on logic, reason, science etc.

Vsotvep · December 13, 2021, 11:07pm

I understand your method, but let me be more precise. Your machine learning is based on logistic regression, the log-loss function measures the penalty for getting a prediction wrong. Your algorithm tries to minimalise this penalty.

I cannot use log-loss on goratings, where the rating points have a different distribution (since they’re produced by WHR, not by logistic regression). The probability to win is a function of the difference in rating points, but these mean different things in your system and in the system by goratings, right? Therefore, I can’t plug the ratings of both systems into the same computation for a probability to feed to the log-loss function.

Updating the rankings in between the games makes the systems more adaptable to players who suddenly become stronger or weaker, but for a short term like 3 months, we can assume that the players have a relatively constant strength. I’m interested in the predictive capabilities of each of the rating systems right now: given the current ranking, which one will be more accurate in predicting the next 120 matches played by the players where the three systems disagree most.

Because I’m doing this by hand. Automating things needs will cost me a lot more time, which I don’t have, but daily checking for the 3 or 4 new games that have been played and adding some points to a tally isn’t a big problem.

If you think you can do a better job then please do so, it’s what I’ve been asking since I first read this thread.

I could focus on different outliers. Perhaps I should indeed focus on outliers of all three systems, and not only on the outliers from your system. It seems less biased.