Are OGS rankings inflated, deflated or neither?

I’m not sure it’s truly a drawback, though, just something to account for (I’m still not suggesting it). If we are pretending to know nothing about a new player’s rank, then, after they win their first game, “infinity-dan” is actually, genuinely the best statistical estimate (maximum likelihood estimate) of their rank. It’s more of a display issue, or a problem with the assumption that we don’t know anything about a new player’s rank.

Of course, we do know things about a new player’s rank, and this is the actual problem, but the current system does pretend not to know anything. Glicko-2 does kind of sort of, at least a little bit, require that assumption.

1 Like

Oh, I was talking about the automatic matchmaking system, sorry. Yes, for ranked games, including custom challenges, you can only play against people within 9 ranks of you, which has been often been (correctly) pointed out as another thing making the beginner experience pretty bad.

I’m honestly not sure of the ins and outs of how that system goes.

But as @jlt pointed out, provisional players should be able to play custom unranked games with any other user. (By “should” I mean “unless there’s a bug”).

For the rest, you bring good points, and I especially agree with the final bullet list :slight_smile:

@paisley: your comment kinda goes over my head, and it’s definitely too technical for this discussion – we’ve been derailing it enough already :sweat_smile:

2 Likes

You can get ranked games against anyone via tournaments and ladders. So the system can at least handle that kind of game when it comes up.

3 Likes

I see no reason not to remove the “no ranked challenges beyond 9-stone rank difference” rule entirely. It’s almost certainly just a matter of deleting the check. As you say, the tournament and ladder scenarios are already testing the system’s ability to handle it, so the risk should be minimal.

Ranked games with more than 9 stones of handicap, though, these should be forbidden.

I think all this is shaping up to be a pretty good set of improvements to the status quo, if I understand the suggestions:

  1. Show “?” players ranks as “?” everywhere, so they aren’t discouraged by incorrect notions about their rank falling, or who “should” be their strength peers.
  2. Allow ranked matches between any rank, so that new players can immediately start playing whoever they think is appropriate. (A form of “self-identifying” strength, I suppose.)
  3. Some kind of “beginner room” system to show beginners the ropes. Either a new system, or bolted onto the existing groups system or whatever, which would probably be easier to implement.

I like it.

6 Likes

I love it! :slight_smile:

I think 1 and 2 are easy to implement from a programming standpoint (3 would require a lot more work). Programmers, feel free to correct me if I am wrong!

1 Like

Maybe a year ago (can’t remember well, which is why I never improve at Go) my rating at OGS suddenly improved by an impossible amount. I didn’t believe my new rating, but lots of people told me it was reliable. Now my rating has sunk back a bit, although I think I’ve improved slightly since then.

It may be that the basic algorithm of changing the rating depending on others seems to work only when you are considering one person at a time, but if you consider everyone, creates a drift over time, much like the genetic drift of isolated small populations of plants or animals, as Darwin found in the Galápagos Islands.

If so, then comparisons with other rating systems like AGA or KGS will eventually reveal the drift and a system-wide correction might have to be done again, and again.

You could test for system drift by writing a simplified simulation in your favorite programming language, giving it the same rating algorithm and random “game results” having similar statistical properties to OGS game statistics.

Just my opinion. I’m no expert at rating systems. Note that I play only 9x9 games due to being busy, if that skews my experience in some way.

1 Like

Looking at your OGS account, that was when the rating system was changed. I believe most ranks got “inflated” back then in an attempt to match AGA and EGF ranks.

This is becoming a broken record at this point (not your fault, but I’ve heard this suggestion many times), but why would you assume other ranking systems don’t drift over time? So far I have never heard of a ranking system where ranks are provably stable over time, and intuitively the traditional Japanese ranking system would be even more liable to drifting.
Comparing the relative drifts of two ranking systems doesn’t necessarily reveal which one is actually drifting.

I’m genuinely interested in this, but I don’t understand it. Are you talking of checking if the distribution of virtual ratings shifts as a whole? How is that different from looking at the actual OGS statistics?

Thank you, espojaram, for your reply. The value of a simulation would be twofold: first, it would prove the drift mostly independently of OGS statistics, and independently of other rating systems. Second, it would then serve as a vehicle for investigating improvements to the rating system that would be superior to using OGS in that it would eliminate the long time required for a real test of specific ratings changes on OGS.

If it is true that all rating systems drift (we need objective proof), then rating by comparison with other players is inherently flawed and should be abandoned, even though it continues to be used for automatic player matching and handicap determination. Alternative rating methods would need to be similarly objective as the current OGS method, but absolute, not relative to other players.

For example, special ranking games could be played against a certified AI player with very stable and adjustable playing strength periodically. So, a 4 kyu player would play several games against the AI player set to 4 kyu at some time interval in order to confirm the 4 kyu rating, or to change it by at most 1 rank.

(Newcomers to OGS need to be kept unrated for many games. Nevertheless, I suspect an AI evaluation player could play at levels like 25k to 35k, and improvements could be made several ranks at a time to accommodate the rapid improvement of beginning players. Players with known ratings on other systems could obtain the same rating when joining OGS by showing evidence of their current rating.)

(This method might fail at higher dan ratings or professional ratings, but those could be addressed by relative play, since drift probably slows down with higher rank. I say this because, for example, tesuji used in the beginning of advanced games is perfect play, and because I conjecture that a 9p or even 9d player has much less variability of playing than those in lower ranks.)

These are all my opinions, so they need corroboration.

You can read more about the KGS system if you want

https://www.gokgs.com/help/rank.html

and

https://www.gokgs.com/help/rmath.html

and

https://www.gokgs.com/help/anchor.html

My feeling is that it’s much more chaotic than OGS or any other system. They supposedly have a set of anchors, players that they think their ratings are very stable and then base everything else off of that.

I feel like the net result ends up being that your rating jump all over the place even from inactivity.

There was a joke I saw a while back on how to become 1d if you were stuck at 2-3kyu

A few of the other replies to the post also were examples of plays stopping playing and their ratings drifting noisily upwards.

2 Likes

I believe this just doesn’t make sense. If the whole population’s ratings drift together, then the automatic matchmaking will still match people of similar strength together. The auto handicap system is trickier of course, but it wouldn’t necessarily get worse either, especially if the drifting is somewhat homogeneous.

The main reason why drifting is undesirable, I believe, is because the same player can end up having different ranks in different populations, but that’s more a problem of the optics and cultural implications of having a specific rank, rather than the rating system by itself.

I agree that, to solve the equalization of ranks across populations, some kind of bot-based system is the most promising idea to investigate – but I also believe it’s actually surprisingly tricky.

 For example, the reason this doesn’t work easily is that most bots which play at kyu level are down there because they have some specific “flaw” or “blind spot”.
 A “4 kyu” bot is usually, perhaps always, not a good representation of what being a “4 kyu human” means – to be clear, this in practice means that players would be able to manipulate their rankings by just learning the specific “blind spots” of the bot and beating it consistently.

 I can imagine some ways to try to circumvent this drawback, but I think it would be surprisingly tricky to have an actually solid system.
 In a way, the issue really comes down to the fact that we don’t really know what it means for a Go-playing entity to “be 4 kyu”, or to use population-specific rating, say, 1600 Glicko on OGS.
 What does that “mean” in terms of your play, really? I don’t think anybody in the world can answer that at the moment.

I think the most promising idea is actually to have a machine-trained bot that learns “what a game played by a (insert arbitrary rating number here) looks like”, for example, but until someone actually tries it, there’s no way of predicting if such a system could be reliable at all.

3 Likes

I agree with all your points. We need someone who is interested in the problem of Go rating who is also interested in programming and AI to move forward with this to find out if it will work. As an experienced programmer (not in AI), I feel that an initial implementation, using a probabilistic simulation of OGS statistics would not require cutting-edge abilities. Using a neural network and training it would not be much harder. Investigating actual comparative strategy and tactics would be very hard, but also very rewarding. It might make for a good PhD dissertation.

I especially agree that the value of rating is not in its utility as a matching mechanism, but in its value outside of OGS. It would serve as better motivation for learning Go if everyone had a reliable rating as a guide. Also, if I meet a stranger in Central Park to play Go, it would be helpful to know their rating relative to mine, if only to determine handicap. For this, better understanding of good rating systems is needed.

1 Like

Agree that this is the technical purpose of a rank.

My view:

  • It is not so much “the same player can end up having different ranks in different populations” but a similar effect on a large scale (the average X kyu on OGS would be X+n kyu in a national association).
  • Since certain countries or servers have a stronger pool of players and ranks of individuals naturally fluctuate, small differences (like n ≤ 2) are normal and usually not problematic, except for measuring milestones (like reaching shodan).
  • A large difference (like n ≥ 5) could mislead players (indirectly pointing them to study materials that are way too advanced for them) and cause credibility issues for the server or association.
1 Like

I don’t think these issues of rank differences between different populations and servers can be fixed by a PhD student investigating rating systems.
It’s more a political issue than a technical/mathematical issue. Different go populations across the world won’t agree on what 7d, 1d, 10k and 20k are supposed to mean exactly.

3 Likes

I’m getting a feeling that people pushing for this unification of ranks, typically, don’t realize just how bad the situation really is. It’s not like there’s a “universal” rank level and online Go servers are the only places drifting away from that: essentially all the major Go associations and hubs in the world have sometimes vastly different standards for what level is “1 kyu” and so on.

According to the statistics presented here, for example, a Japanese “5 kyu” amateur would be probably be about “15 kyu” in the AGA.

Before people whine about OGS ranks being slightly off with respect to the AGA and the EGF, perhaps they should wonder why the AGA and the EGF are so wildly off relative to Japan and China. Who is right? To which standard should OGS comply?

2 Likes

Thank you for clarifying. I was not aware. This makes it even more frustrating for players motivated to improve. Many study resources are targeted at certain levels (TPK, DDK, SDK or dan). With this situation, what does a goal like shodan even mean?

Perhaps because the standard is much higher in East Asia than the West?

1 Like

Not really. For example, 1d EGF is more like 4d in Japan. And Chinese low-mid dan ranks seem to be all over the place in recent years, especially for younger players (I assume that is because dan diplomas have become a source of income for go schools in China in recent years).
But AFAIK, Korean IRL higher dan ranks are solid, probably a bit tougher than EGF higher dan ranks, which are a bit tougher than AGA higher dan ranks.

As for servers, IGS dan ranks seem solid from what I hear, but other Asian servers have much softer dan ranks. KGS is fairly similar to OGS at the higher end, but both are American servers, so perhaps expected to match higher AGA ranks somewhat.

2 Likes

I don’t think dan diplomas are directly a source of income but surely having your students graduating makes the school more attractive. You get diplomas with results in a special official tournament you don’t buy them (to my knowledge)

1 Like