OGS rank deflation is a topic near and dear to my heart ;). So I spent a week and $20 in GPUs to to look at the data.
Here’s the short summary:
Katago trained a human policy network conditioned on various server ranks and other metadata. I used that to create a rank guesser (KGS scale, medium time settings, try it here: https://rankmle.david.ma/). Then I applied this rank guesser to a sample of games on OGS.
Before the 2021 adjustment, OGS ranks below about 5 kyu were very tough. Many double-digit kyu OGS players were estimated by HumanSL as much stronger on the KGS-style axis.
From 2021 through 2024, the system looks much closer to the HumanSL KGS-style scale. The lower kyu ranks move toward the diagonal, and the median estimates are less severely compressed.
In the 2024-2025 slice, the lower ranks appear to be drifting harder again. The effect is not as extreme as the pre-2021 data, and the sample is shorter, but the direction is visible.
The yearly medians tell the same story more compactly: a sharp correction around the 2021 adjustment, then a later bend suggesting renewed rank deflation, especially below single-digit kyu.
I am curious how effective this method will be against “styles” change, what would happened if we match these with games prior to 2017? And the distribution of plays, the effect in 2024-2025 at the lower ranks seemed to be more about the spread and distribution of different kinds of playing styles/strength than the actual change. Can we ruled out the hypothesis that it has to do with players’ declared ranks mismatch (like how many of these “spread” are from players with established long term ranks, and how many are from new accounts? or rarely active account?)
Don’t get me wrong, interesting and something to think about. But the assumption that there should be a diagonal is based on that the assumption that the KGS ranks are “correct” - someone please correct me if I understood this wrong. I’m not sure if KGS is the correct baseline or what the baseline should be. But just taking KGS without any wariness shown is making me question what we can learn by this.
They accumulated thousands of games, but have had long periods of pause in between, and then came back. From a few new and old games I scanned through across decades, their strength didn’t change much, but their ranks clearly have gaps and rank deflation in recent years.
Then there are players who joined in different periods, from 2020 and play consistently, like this player, and it seemed pretty clear when this started (somewhere around 2023 to 2024, and accelerated since last year)
I feel like we need a survey of the players’ EGF or AGA ranks again, to see how far off we actually are. By this point, this should be pretty substantial.
Rank graphs on OGS do not display which rank someone actually had some years ago in the old system. They are recalculated after every rank system update.
Also the date of discussion is after 2021 rank adjustment, so even if the old records have different rankings, that is not the issue here. They were still pretty consistent in 2022 to 2023.
Yeah I’m pretty sure what happened with one of the rating changes was that it recalculated everyone’s rank from the beginning. Old enough accounts used to be able to choose a starting rank, so you might have been a 25kyu won a bunch of games against similar or sligthly weaker opponents and move up to 20kyu or 15kyu etc.
On the recalculation though, instead of remembering what you chose as a starting rank, I believe everyone was reset to 1500, so if you won against a bunch of other new accounts that were also reset to 1500, you can have a big spike in your rating, sustained over a long enough period of time.
so, rank of every ddk strength beginner who did slightly better than average ddk beginner became sdk
rank graphs of before glicko era became totally meaningless and lost data
I am now looking through the first OGS database dump from Aug 2021.
The json file has a field called rank (I assume the “rank”:0 is 30k, and “rank”:38 is 9d?), and a field called “egf”, I assume was the rating? Does this data dump showed the original rankings and ratings? before all the changes in recalculation? (from what I can tell they preserved the rating before 2021, but no idea how far back it was valid)
I used to imagine sandbaggers everywhere, and every time I lost. I used that tool ^^ many times to “validate” my belief and sure enough, it was rating my opponents at 1 kyu instead of their displayed 8 kyu (for example). Eventually one day the thought occurred that I should run it on myself and oh…that’s interesting, it’s also calling me a ~1 kyu. Upload another game, same thing. I know for sure that I’m not a 1 kyu, or if I am it’s not what it’s cracked up to be.
TL;DR howdeepisyourgo is very generous in estimating ratings (but is still useful for relative ratings). Pretty sure I recall @square.defender making that comment back in the day.
Ideally there should always be a diagonal, it’s the slope of that diagonal which changes with each base line. In that sense KGS is as good as any other baseline.
Interestingly there is a log scale mapping between OGS rankings and OGS ratings, but this doesn’t seem to be present in the latest 2024 comparison, so maybe KGS ranks have a similar log scale in rank skill difference?
In a perfect ranking system, a player with rank n giving k handicap stones against a player with rank n-k should win 50% of the time if the game is with reverse komi (so slightly more than 50% if the game is without komi). If two ranking systems are perfect, then the graph comparing them should be a line parallel to the diagonal x=y. Since this is not the case for KGS vs. OGS, it means that at least one of the two ranking systems is not perfect, but we don’t know which one(s).
Yeah this. The last big rating adjustment was in april(?) 2021, that recalculated all the previous rating histories retrospectively to match the adjusted rating system. So all rating data from before that adjustment is bad data, the ratings we see are not reflecting the ratings/ranks that the players had when the games were played.
OGS uses the formula rating = 525 × exp(rank/23.15) where 30≤ rank≤31 is 1d, 0≤ rank≤1 is 30k. So when the rank is 0, the rating is equal to 525.
What is funny is that the EGF rating system has another formula converting ranks into ratings: https://europeangodatabase.eu/docs/about/egf-rating-system#system-description and it looks like the logarithm is the other way around. To compare the two formulas requires a bit of math because what is called rating on the EGF page is really the rank (in the sense of OGS) multiplied by 100, minus 950, and the rating in the sense of OGS is 400/ln(10)×β(r). To make comparisons meaningful, I’ll add a constant to the rating so that bottom 30k corresponds to rating 525, so the EGF formula becomes
rating = 525-1216×ln(1-rank/42.5)
Both curves are shown in the graph below, OGS is blue, EGF is red. Hope I didn’t make calculation mistakes.
It would be interesting to compare with KGS, but the KGS rating system is a mess, it takes into account past games but gives more weight to recent games, so a player’s rating changes even without playing.
Yeah the SGF review feature is super early. I wouldn’t even advertise it yet. I’m still exploring problem of “pick the right moves to review”. There are obvious problems with point loss. So I’ve been exploring using the HumanSL networks somehow. Stay tuned!
Ah yes, I’ve seen that one! It’s more complicated than what I’ve done. From the thread ( How Deep is your Go? - #55 by jlt ) I saw some pretty involved data cleanup operations. If there’s a way to run @Animiral ‘s tool on the ~6000 sgfs sample I used to produce the graphs, we could see a scatterplot of how it compares. Also, that tool is calibrated on OGS games.
One thing I’m interested to do is host HumanSL bots on OGS and see where they land. Though bot games aren’t rated anymore these days, I’m not sure if the bots will even get a rank? Or is it only not ranked for the human but ranked for the bot?