Yet another ratings thread

Surprise poll time!

Consider this pie chart of game results:

What would you expect their rating to do? It is recommended to vote before expanding the mystical hidden text.

I would expect their rating to:

  • Decrease
  • Stay the same
  • Increase
  • Unknowable
0 voters
Mystical hidden text

You likely guessed where this was leading, but here’s the rating graph over that period:

I guess that losses to weaker opponents are heavily penalized, and I know this has been discussed to death. It just seems counter intuitive to me that someone can win 67% of games and still be trending down over that period. I couldn’t explain that to a six year old.

Yes, they outplayed me at every turn - I knew that as it was happening, I’m not actually worried about it. fwiw though, the primary reason I don’t play on OGS more is that I have no idea what I’m going to get. On three other servers, auto-match games feel much more even. I’ve heard a well-known bearded 5 dan state that they don’t know what OGS ranks mean, and that’s someone who understands the game better than most.

It is what it is.

ETA: @Uberdude has enlightened me regarding ratings. In the absence of any other data, the correct answer to the original question is actually “Unknowable”.

1 Like

Hugely depends if they are playing handicap or even games.

1 Like

Roger Federer comes to your local park and plays 3 games of tennis with randos. He wins 2 and loses 1. Do you think he got better or worse than when he was winning grand slams?

4 Likes

This is one reason I advocate for handicap games on by default, it makes stats like this lead to more intuitive results.

1 Like

That’s a slightly different question than what happens to his rating. If he, or anyone, is winning more games than they’re losing, it would indeed surprise me to see his rank go down.

You don’t understand ratings then. Your rating goes down if you perform worse than expected by your rating. It takes into account the ratings of the opponents you face. Federer is super strong at tennis so should beat a weakie like me all the time. If he only beats me 2 out of 3 then he’s performing worse than expected and his rating goes down.

Expecting winning more than 50% of games to make rating go up is only valid if those games are ones with, on average, a 50-50 expectation of winning, due to playing people of close strength to you in even games, or different strength but appropriately handicapped to make the expected outcome close to 50-50.

4 Likes

I mean, I kind of get it. If someone were stronger than 2/3 of players you’d expect about a 2/3 win ratio for even games. Actually, an easier way for me to understand it is to consider the player with the very strongest rating. If the top dude (or dudette) were consistently winning every game, I wouldn’t expect their rating to change.

Let’s imagine a 5 kyu and 1 kyu play two games. Game 1 is handicapped, game 2 is not. The 1 kyu is having a bad day and loses both games. My expectation is that the penalty for losing game 1 would be less than for the game 2 loss - it’s a more difficult game for the 1 kyu. In my understanding, a 1 kyu losing against a 5 kyu with a 4 stone handicap would be roughly roughly the same as if they lost against a 1 kyu with no handicap. Is this not the case?

If handicap is not included in the rating change calc, then anyone who cares about rating would have simple logic: only play weaker ranks when there’s no handicap; only play stronger ranks when there IS a handicap. If everyone applied this logic there’d be few match ups other than equal ranks.

The answer is in the fact that losing to a weaker put away more points as to a stronger. And winning a stronger… Etc. (With no handicap games)

2 Likes

With Elo-like systems nobody is expected to have a full 100 % winning percentage, even the strongest player is expected to lose once in a while. So if they only ever win, their rating constantly goes up (but by fewer and fewer points if their opponents’ ratings are constant).

That’s how it should be, yes.

1 Like

It would increase, just not much (if even games).

It is the case. So surely you can then see how

doesn’t make sense, if they are mostly playing weaker players even?

Say a 1k beating a 1k even gets +1 rating, and losing to a 1k gets -1 rating. Beating a 5k even only gets +0.2 rating (because it’s an easy game, that’s expected), whilst losing to a 5k even gets -4 rating (unlikely, should win, so bigger loss of rating). So beat a 5k even 2 times, lose 1 time =
0.2+0.2-4 = -3.6
which is negative.

2 Likes

I’m not quite so sanguine about what looks like general deflation as other commenters appear to be here. I would not be surprised if its real and overall the rankings need regular adjustment to maintain a flat level relating to flat ability over time here. Tricky thing is about this however it might just be caused by general behavior on the site which may or even may not be considered reasonable. In fact recently there was an adjustment to international chess ratings for juniors. This was due to the lockdowns causing players ratings to lag significantly behind their playing strength or ability to exhibit it in rated tournaments. FIDE Adjusts Ratings For 350,000 Players In Massive Change - Chess.com

I did notice an interesting thing about the ratings implementation here recently, that being that the Kue/Dan scale is not a linear mapping from the Glicko2 scale (this can be seen on the rating chart directly). Rankings are mapped on a log scale to ratings.

The interesting thing about this being that pure Glicko2 treats strength disparity as a function of rating difference (a 2500 vs 2400 game has the same expected results as a 1500 vs 1400 game, at least with matching variances). But a stone handicap is also considered a 1 rank difference. With the log scale between either both rating scales can’t be correct anymore. I didn’t work out a good way to estimate if this is expected to deflate or inflate the rating scale in general however. Basically the effect is at higher ranks stronger players give a lesser handicap relative to the difference in rated playing strength, I believe. On the other hand, handicap games seem less common at higher ranks too.

1 Like

If we get extra information about the player whose pie chart was shown that they played all those games against weaker players at the correct handicap, then I think it starts to be more interesting and unexpected that their rank goes down. But this being OGS where lots of people don’t like handicap, my working assumption is there’s a fair chance some if not most of those games were even, hence my “Unknowable” vote.

2 Likes

Win ratio means nothing, on every server.
Here is my win ratio on DGS:

Abysmal, right? :sweat_smile:
Here is my ratings chart:

What’s the “secret”? Simple.
Most of my rated games are against people that are 4+ ranks above me and I never willingly challenge anyone other than players like these. This means that every time I lose, my ranking just goes down a bit, but each victory, rare as it is, counts a lot.

Here are my current opponents in the ladder:

Those are practically “free teaching games”. :slight_smile:
If any of those people lose (unlikely), if that was OGS it would appear just as a “loss against a weaker opponent”, but since I am not just “weaker”, but “a lot weaker”, it makes sense that their rank will drop significantly.

1 Like

I do think there’s a general rank deflation on OGS and maybe this user’s rating graph is influenced by that, but I don’t think the two charts offer strong support for the hypothesis.

Nonetheless, this thread made me just realize something.

Rank deflation is an inconvenience for those who play regularly on OGS (because it obfuscates ones progress), but for those who just play once in a while here it could have a worse impact:
Your rank stays the same while you’re off to play somewhere else, but when you come back the players of the same rank like you are now actually stronger, so your games become harder and maybe less pleasant for competitive people.

Of course that depends on the speed of deflation and might not even be noticeable with slow deflation.

I actually voted “unknowable” and even though I’m not surprised the rating graph went down, that doesnt mean there isn’t an issue with the graph. And no, it isn’t about the rating system working as properly. If you are expected to have 75% winrate and you end up with 67%, your rating would go down. That part is normal.
I also don’t know how handicap games are shown in the graph, But if they are shown in the lower they are basically added into the pool of “you should be favorite against” despite the odds being roughly 50-50. But that isn’t the issue either.

The real issue is the system is trying to estimate the rating with 1/7 higher rated/lower rated game ratio. That already is a problem, but it comes with 2 more: Contstantly playing with lower rated players has a negative effect on individual development, and even if you’d keep improving it would be too time consuming and tiring that it is more likely that you’d end up staying a level or dropping despite getting better.

My assumption is that over a long enough time period, people would meet a blend of weaker and stronger players effectively cancelling out the differential.


I have revised my understand of ratings since the thread opening, but I think my confusion is somewhat understandable. I offer one final thought experiment to justify my erroneous views.

Identical Twins Jubango

At table 1 are identical twins with exactly the same play strength. They play a jubango and win five games each. Hang on, only 50% win ratio? That's not very good considering that they are both 7 dan. Let's downrate them.

At table 2… [blah x3]. Wow! a 50% win rate?! That’s great considering that are both 30 kyu. Let’s uprate them.

So both tables had 50% win ratios, yet we downrate one pair and uprate the other. The thing is that no-one improved or degraded.


No. I have likely misunderstood again. The expected win rate of a 7 dan vs 7 dan is 50%. No change to ratings. Same for 30 kyu. Both pairs of identical twins rejoice and go home richer for the experience.

I don’t think we found out if most of those games were played with handicaps on or not.

My point about the rating vs ranking system is that they don’t agree on the question, what is the expected skill difference between these two players. This applies even if you don’t play handicap games, in that the distance between 12k and 11k is smaller than the distance between 2k and 1k in Glicko2 points. The difference between 12k and 11k is 49 Glicko2 points. The difference between 2k and 1k is 68 Glicko2 points.

Rating to Rank Conversion Visualized - Internet Go / OGS - Online Go Forum

As I understand it those are the differences in playing strength when a 1 stone handicap begins as well.

I don’t think there is a bug where peoples challenge rank restrictions get ignored, so the difference in strength playing up is always slightly wider than the difference in strength playing down as well on OGS. The underlying system is Glicko2 however so ultimately it’s going to be the closest telling us what the difference is between two players.

I had a quick scan through the DeepWiki article linked by SouthernGoPlayer. It’s actually quite humbling how complex it gets. I unpost all thoughts and slink away into the background. :upside_down_face:

I didn’t think about it rigorously, but yes I think you are probably right that there is an inconsistency between the glicko2 rating gaps and probabilities, the log scaling to kyu/dan ranks, and the handicap stones difference that they lead to which should result in an even game.

To me the log scaling on rank explains most of the shortage of dan activity on the server. Not only are dan players rare (a small percentile of the player pool), on OGS these percentiles are stretched thinner again.