Alternative rating system to

These are legitimate criticisms. So I’ve been thinking about how to deal with them. The issue is that the rating estimate has an error bound, so when you play very few games, the error bound is large. No one can deal with it unless you set an arbitrary threshold like 10 games, and players with less than that number do not get on the list.

You make it sound like I rigged the system so that Ichiriki stays lower in my rating. Actually, as someone pointed out, a Japanese person also created his own rating and his rating is more similar to mine than to goratings.

With science, you start by observing something that the theory doesn’t describe well. E.g. that’s how quantum mechanics and general relativity got invented. Ichiriki’s gorating was an observation that didn’t fit reality (my reality might be biased, but my modelling method is objective). So I tried an alternative objective method which is based on simple machine learning models and it shows that Ichiriki is nowhere near the top 10. So there.

I use a machine-learning algorithm to derive the ratings.

There are some bugs in the script when doing the reporting, the actual calculation of the rating is fine. I will slowly fix these kinds of bugs. But the main thing is the rating which I think is a fair reflection of the players’ performance in the last 365 days, given they have played enough games.

You might want to compute ratings with 730 days of data to see if it significantly changes the results. Two years of time would allow to use more data, and thus get more reliable results.

actually, best way is to use both and test which one is more accurate.

things can change a lot in one year, so I think 365 days is reasonable.

I will see if I want to invest computational power to compute a few variants and compare.

No, it’s not: how do you measure that your rating is an objectively better predictor for the outcome of matches between the players, than for example the rating provided by goratings?

You’re leaping over an entire step by making an algorithm that gives you some rating, without giving any clarification for why this algorithm is better at rating pro players than the existing ones. Your modelling method may be objective in the sense that it is purely based on the outcome of the games, and not on other factors, but you are not objective in deciding to use this particular model. Why did you choose your method and not some other method? Why did you choose a 1 year interval? Is the way that you test whether your algorithm works objective or is in influenced by your bias?

I’m making it sound like you rigged the system, because from your comments here, it really sounds to me like you rigged the system, whether you’re aware of it or not.

If so, how, or why? What is the machine learning to do, predict the outcome of future matches, or fit the data to your expectations? Why are the machine learning models you use objective, and why do they give better prediction than the existing ratings?

Also, what I least understand of all of this, why are you so convinced that Ichiriki Ryo should be ranked lower? He’s playing on par with players of comparable rating, so what are you basing this on?


I am surprised that a ranking system need that much, considering the sizes of the data ( A few hundreds games and players).

I guess we can spend some time to see if gorating is more predictive than my rating. But the issue is that goratings change their ratings retrospectively. Anyway, it might worth putting it to the test.

I will see if I can find some time to see the prediction accuracy rate of goratings vs my rating. Actually, ppl can already do that since the ratings are open on both sites. lol, it just take effort.

why does the EPL play a double round-robin? Why does goratings use the whole history? You can nitpick infinitely. Oh also, why is the drinking age 21? what if the person is 20 years and 364 days of age.

I think 1 year is definitely reasonable, but the interested person can try 2 years 3 years and see which one is objectively more predictive but it takes computational power and effort. so doubt anyone will be interested.

Based on this game based on the last 365 days. The model may not be perfect, but I don’t think you can seriously argue that the ratings from logistic regression are completely wrong. Logistics regression is probably the most widely used technique for estimating binary outcome (win/lose), so it’s pretty robust albeit not perfect.

Because the onus is on YOU to provide evidence that what you claim is likely to be more accurate.



Lol. I am happy with it. Another rating done independently also more or less corroborate my findings.

So I don’t have to prove anything. It’s plain to see for me that Japanese players are no higher than 30 or 40 in the rating since they haven’t won an international title for more than 15 years. So rating a Japanese player so high (within top 20) doesn’t make sense to me. Again, these are corroborated by a rating list realised by a Japanese person. So there.

Thank you.

I can now completely dismiss your rating system and other claims.


btw this is the rating done independently by another person

Let’s be clear: you’re trying to convince us you made a good rating system (and doing an incredibly lousy job at that), I’m only trying to ask the questions that could help you with that.

If you refuse to answer them, then don’t expect me to think your rating system is worth anything.


Lol. Good for you. Anyway, my rating and goratings and the list i showed above are complete in the open. So ppl are free to make up their own mind.

I guess to you, gorating didn’t have to prove they are better for some reason and that my ratings are just wrong and that I have to prove my ratings are right. But you are entitled to your opinions.

I am sharing this list and my data and methodology are in the open (including code), it’s just that I haven’t spent the time to describe them. But a skilled statistician/data scientist can read my code and do some independent assessment.

no. i am not trying to convince anyone. my rating is open for all to inspect. i don’t need to spend time convincing anyone. imo, it’s a better reflection of reality. you might hold a different opinion. I can respect that.

i don’t hope to convert anyone, except to raise aware about it. thanks

→ Subjective

Yet you claim it is better, → Subjective

→ Subjective

→ Subjective


sure. data science has its subjective elements. I agree with you.

but you can find an equivalent list of the subjective decisions in ANY rating system. So i don’t think pointing out my subjective decisions makes my rating less worthy.

My point is that goratings is wrong based on my “subjective” sense that Ichirki is not a top 20 player. Japanese players have not won an international title for 15 years. And their record in the Nongshim cup hasn’t been that great. So my subjective sense is informed by these factual results. IMO, goratings’ ranking does not reflect these facts. so there.

Also, as I have indicated, a rating list is done by someone else independently more or less corroborating my findings of Ichiriki’s ratings.

Not winning an international title doesn’t mean you don’t deserve to be in the top 10 or 20. Until recently, Shin Jinseo didn’t win an international title either.


There’s no way to discuss this reasonably.

I’d advice not to call this data “science”, the scientific part seems to be absent.


u r right. they don’t equate. but there is a high correlation. Shin has consistently made the finals of international recently!! This is also highly correlated with his #1 ranking on all lists.