Are old ratings accurate?

So, I have been playing on OGS since May 2014, and according to my rating graph (as seen on my profile yebellz), I peaked in rank on January 15, 2015 reaching 1.9 kyu ± 0.9

Since then, it would seem that I have taken a slight decline, down to my current level of 5.5 kyu ± 0.9.

However, if I look at my games from that January 2015 era, I see that I was actually around 10 kyu. In fact, for the game where I achieved my apparent peak, I see from the chat log that OGS had ranked me at 12 kyu at the beginning of that game (Sept 2014), and I was 10 kyu by the end. I believe that around 10 kyu (or weaker) is actually a more accurate rating for me in January 2015. Note: I had also briefly changed my name to “YeGo” but then later back to “yebellz”.

I understand that the ratings system has been updated several times since back then, each time with aim of improving its accuracy and calibration, and that typically old ratings are essentially recalculated from scratch for each of these updates, which has the effect of revising old ratings.

Does this revision procedure end up making the ranking history less meaningful? Are historical ranking inflated as a result?

6 Likes

No

3 Likes

I would be cautious with rating evolution because it seems that new rating is more accurate around 1d as for sdk (and even worse ddk)

The evolution should still be correct but your old ranks could be inflated by their recalculation

1 Like

Yes

I already pointed that out in the forum: according to my actual history, I’ve never been DDK and started my go journey directly from 9k.
That’s silly.

My actual OGS rank is about 5k, which is also hard to believe. But I can stand with the choice of having a ranking system which is independent from others: I am 9k on EGD (also due to recalculation, since OGS isn’t the only one to do such things) and I can live with it. On OGS I’ll look for 5k and irl I’ll look for 9k to have an even game. That’s fine.

But I can’t believe that my history was compressed from TPK-9k (as it was before) to 9k-5k (as it is now).
Older ranks are definitely not reliable.

9 Likes

Looking at my history, in this 2017 game I was 15k (old rating) and 9k (new rating). (P.S. I was 13k EGF at that time.) It means the old ratings were much stronger than the new ones in the DDK range. Unlike yebellz, my all-time high is recent.

Generally when I play a lot, my rank goes down: I want to make up my losses quickly, so I play without thinking, and thus I lose more…

6 Likes

Yea, the adjustment in january recalculated / messed up the ratiing history. Basically everyone who has played few years or more has inflated graphs in their profile. Thats because people have improved, i have many wins against people who are nowdays lot stronger than me, but were ddk’s when the games took place.

I dont personally mind because the new graph makes it look like i was regularly dan-level player around 2016-18 ^_______^
(my rank used to be 5-8k back then, and to this day i still havent actually reached that mystical ogs 1d)

5 Likes

I don’t think the rankings are accurate. Whether this is due to online play, mixing different board sizes, botting, or sandbagging, I don’t know. I lost IRL to an AGA 9k who was slightly stronger than me; I think he would be about 4-5k on OGS. I also played an AGA 5k who would be on the cusp of Shodan here.

I am wondering if the rating in correspondence games is inflated compared to the rating in live games. At least this is the case for me, and also for most of my opponents in the two correspondence tournaments I am taking part in.

WM-Screenshots-20211101135340

no meaning left, it became absolutely not meaningful.

2 Likes
1 Like

I have no doubt that correspondence ranks are often one or two stones stronger than live ranks. The main effect of correspondence, I think, is to reduce greatly the blunders caused by inattention or distraction, which may be brought on by tiredness, life stress, time pressure, etc. Of course, the effect would not be uniform for everyone, since people vary in their susceptibility to these factors. With this in mind, and from my perspective of wanting game results to depend on game factors rather than extraneous factors, I would not call correspondence ranks “inflated” for people susceptible to the foregoing factors; I would call live game ranks “deflated” for them.

3 Likes

Suppose A is 3k in live games and 2k in correspondence games, and B is 2k in live games and 3k in correspondence games. A plays mainly live games and B mainly correspondence games, so both of their overall ranks are 2k. When A and B play against each other, the rating system expects the winrate to be 50%, and yet A is slightly stronger than B. How is that possible? My hypothesis is that there are two pools of players, people who play mainly live games and players who play mainly correspondence, and these two pools don’t meet often enough so the ranks never harmonize.

2 Likes

Since the pools rarely mix, I don’t see that it matters much. Similarly, there are people who play mainly blitz and these who don’t, and it seems unlikely that their respective ranks “harmonize.” Or those who play mainly 9x9 and those who play 19x19. Further, the massive sandbagging has a much greater deleterious effect on ranking, IMHO.

2 Likes

Average rating decreased over time

2 Likes

coefficients of shape of Time-Space Distortion can be calculated
to make old ranks comparable with new ranks, old ranks should be changed according to coefficients.

2 Likes

Sandbaggers

1 Like

That makes sense. After all, rankings are only relative.

1 Like