I would like to add to that:

From the difference of your OGS number to another player’s OGS number you can calculate the probability of winning. You can call that probability the “expectation” of the game.

If two players with equally stable OGS numbers play many games against each other and their game results do converge against that expectation, their ratings won’t change.

I think what @n0w3l meant to say is that the ratings system do not measure strength on some sort of absolute scale, since it is entirely arbitrary what something like “9 kyu” means in isolation. However, while ratings system do aim to measure relative strength within a system (comparing two players rated under the same system), it is difficult to compare two players rated under separate rating systems.

Sure, there is no absolute scale. And @n0w3l also lists that fact separately as his first point.

If you have different rating systems (that both tell you the probability of winning based on the rating difference), it’s very easy to align them if you have at least one player who would play in both systems more often and who would win and lose in both systems often enough.

If tomorrow aliens would land who would also play the game of Go, all it would take is a number of games with some of them (with enough wins and losses) to align their rating system with ours.

Assuming there is a reasonable amount of overlap. If they are all 60k, we will never know because a loss is a loss

yes, obviously @FritzS is right, if an algorithm sorts players by “chance of winning”, then it’s somehow measuring their strength…relative to one another.

I just wanted to make simple and obvius that one thing is glicko which sorts a pool of players by “chance of winning” and another thing is how OGS decides to translate that to kyu/dan.

The first one is pure math. The second you can choose in order to give different meaning to your rankings.

First I appreciate the effort to try and summarize the ideas nicely

From what I’ve read and understand, there’s no reason to expect the overall to be the average (in any sense) of the other ratings. They’re all separately calculated.

Also there’s more than just one number changing each game. If you play a blitz 9x9 game, your overall, blitz, 9x9, 9x9 blitz should all change with that game (if I understand correctly).

I tried to come with an analogy with a much simpler rating system than ELO or Glicko but still trying to capture some of the points you highlighted

the analogy being here for example Overall rank calculation - #10 by shinuito

Again feel free to pick apart that analogy. I don’t expect it to be perfect, I just wanted to imagine a system whereby your ranks vary and the overall isn’t necessarily an average of the others. Also that one could have your overall rank higher than any of the others (as can happen both in that example and on OGS with glicko - what kind of average does that!)

True. I mean… One can still take the weighted mean, but OGS doesn’t. As @benjito pointed out OGS calculates the overall ratings **using glicko, not averaging**

About this…

Since the overall rankings are calculated via glicko, then they do change with every game… but then we should also correct and say that each player has 12? `OGS numbers`

… too many details for the simplified explanation that I was trying to make

(by the way, when you think about it, it makes more sense to use glicko , so well done OGS!! )

This is why I said “with enough wins and losses”.

By the way: If they manage to land on earth tomorrow, there is probably a greater chance that we humans are the 60 kyus for them…

Good discussion.

Ratings systems are, so far, mathematically isolated from each other. But this could change in the future, using completely new AI methods. Consider that a human can look at a single game record and make an estimate about the relative strength of the players. AI may be able to do this in the future, and even more accurately! Such a system would not be limited to just wins and losses only. It’s more like looking at all the sides of the dice with different values to see its strengths and weaknesses. It could go even further than giving simple odds of player A beating player B, but actually make predictions about the play style, margin of victory, and so forth.

already

OGS do analyse every game, I guess moderators may already have access to such ranking to search for bot users.

Right, it’s not such a great leap from there to an AI that can look at game records across servers and make assessments and comparisons. It might be able to say, e.g., that a 5 kyu on one server will have even chances of winning against an 8 kyu on another server, and plot the alignments on a chart.

In the end, are ranks broken or not? At least do they work as intended?

Loving this comment right now after 500 comments!!!

Yes, I’d say everything works perfectly

There’s a problem with annulled games not updating ranks, which means we currently have no way to deal with sand- or airbaggers. This should be fixed with the next update, though.

For the rest, as far as I know, things work fine.

I don’t even think AI (assuming you mean Leela-style ML) is necessary. All you need is enough data points (people who are ranked in multiple systems) and a reasonable statistical model (polynomial regression would be fine I think) and you can compare just about any of the major systems.

The only issue is that we don’t ever have enough up-to-date data (but OGS is trying to do that with account linking)

Not working well for 9x9 handicaps in my little tournament. So far black has been winning everything. My best guess is that handicaps are still using the old ranks.

What about the beginner rank problem?

What is the beginner rank problem?