Are OGS rankings inflated, deflated or neither?

Well, when I was 1d OGS I managed to get 5d Foxy but I’m not sure if I could hold it, I kinda got bored there. But 1d OGS is solid 4d Foxy, 2d OGS is surely 5d Foxy.

But on Foxy ranks are jammed. You have like 9d pros and AIs. Some weaker pros even got knocked down to 8d. And you have 1d amateurs at like 4-5d. So jumps in difficulty with 6-7d are big.

3 Likes

It’s not difficult for me to imagine a system where beginners can play rated games together and so, have a 50% chance of win.

3 Likes

I didn’t say it was difficult, I just said it was idealistic, but whatever. How would you solve the sandbagger problem?

Don’t you expect the increased presence of sandbaggers at the lower ranks to make the true beginner’s experience worse?

So how would you achieve that?

Well, that has been debated a lot in the threads i mentioned before. Want me to write all again?

One thing to mention is that it’s completly useless to give a unique entrance point to the gliko system to work.
Besides a beginner could enter as beginner, even no need to set a ranking on his own, and get what he wants.

Anyway please read the debates which already happened and which are not out of date (unless you find some new ones, and i would be pleased to update the compendium)

4 Likes

It’s the OGS version of Godwin’s law: all conversations about ratings eventually end up about the terrible system for beginner ranks!

3 Likes

Which is a special case of rating inflation and its impact (my intended topic), thus relevant.

1 Like

Well, you gave me quite a bit of homework there, but I think I’m mostly caught up now.

I believe the only place in all of those links where the sandbagger problem is discussed extensively is in this topic, starting from about this reply, and my personal verdict is

no good solution has been proposed to the sandbagger problem.

I would agree that the system has changed a lot in the recent years and that testing would be required to verify or falsify this following hypothesis, but:

  • the most likely scenario of letting beginners self-identify is that the situation would revert back to the old landscape, where beginners get stuck at the lowest ranks unable to ever get up, because that’s where all the sandbaggers are, and moderators have a huge burden of having to deal with all the reports of intentional and unintentional sandbaggers and airbaggers.
     Most sandbaggers at the lowest ranks would usually not get caught because beginners usually don’t have the experience needed to recognize a sandbagger and report them. And “getting stuck at the lowest ranks” would be psychologically awful for the beginners not because of purely cosmetic reasons of them wanting to rank up, but simply because they’re constantly being stepped on by sandbaggers in the process.

This is the best counter-point I found:

Intuitively, this sounds naive to me, for the reason mentioned above: if you get stuck as a beginners with all the sandbaggers, you are already getting stepped on. Except in the current system you only get stepped on for at most 20 games if you’re extremely weak, and less than 6 games in the majority of cases.

So I have to say, the current system might actually be the best case scenario in the interest of the beginners themselves, excluding scenarios where, say, we have the resources to have trusted users manually vet each and every self-identified beginner, or direct beginners to some kind of tutorial instead of letting them play.

(Instead, one group of people that is undoubtedly burdened with the current system are the established SDKs, who have to deal with all the new accounts, including true weak players who often don’t understand etiquette, fake weak players who are setting up to sandbag, alt-sandbaggers, and strong players essentially sandbagging unintentionally.)

But then again, my hypothesis might be wrong and letting beginners self-identify might lead to a better situation.

I guess this is the point where we need a forum-wide poll to let people vote on whether it’s worth it to try out that system. Though we do also need to agree on the details of the system.
(Well, about one year ago was the point to do it, apparently, but better late than never)


Before somebody comes along and chants the old adage that "there's no empathy for the beginners", (click to expand)

I want to clarify that I (believe I) do empathize with the beginners, and my personal view, perhaps a very unpopular opinion, is that I don’t even believe letting beginners play with one another without supervision is usually the best way for them to learn, and I feel it’s kind of “the lazy way” to do it instead.

 Especially in such an uncontrolled environment as a worldwide public online platform, it’s absolutely unpredictable what their experience will be, as every person is different, both in character and in technical proclivity.
 Some beginners are somehow 20-25 kyu from the start, some are closer to 35-40 kyu. Some come here with an understanding that losing often means learning, some of them want to conquer the world and get soul-crushed by the perception of being average or even weaker than average.
 The beginners who are technically or psychologically weaker especially need to be handheld one way or another. Even an automated tutorial is better than just throwing them in the pit, if we don’t have the resources to actually give them a teacher or someone to supervise their games.

That’s what I believe, and that’s why I believe changing the rating system to let beginners self-identify is not necessarily the main priority.

Letting beginners play each other might be the most efficient way to have them learn, but nowhere near the most effective, and if those games are not supervised, I personally don’t believe it’s likely to do more good than harm.


To be clear, my personal position is that I’d like to see the experiment happen: I’d like for OGS to try out a system where beginners are allowed to self-identify.

Even though I personally believe it will lead to chaos, we need the test to know for sure.

I believe a combination of this proposal and this proposal, with a few tweaks and polish, is a good starting point.


To conclude, I want to get back to this thread (:sweat_smile: I guess in the future I might have to copy-paste this reply somewhere else, but I didn’t really know where to put it) and point out that there’s at least one thing we can probably all agree on, and it’s that (with the current system) letting users see the estimated rank of provisional players, in thumbnails and in the profile page graph, leads to a lot of confusion.

At the very least the thumbnails should not display the rank without question mark, and they should just display a question mark.

I personally believe the graphs should be kinda overhauled to be more readable for science/maths/statistics laymen, and I do have specific ideas on how, but I don’t know if anyone is gonna listen to me, so unless someone expresses interest, I won’t bother creating the concept graphics.

If the graphs can’t be overhauled that way, at least the current rank estimate should be hidden everywhere until it’s somewhat established, to avoid confusions like this one

Which, I believe is the most likely explanation, happened because of the thumbnail/“humble rank” discrepancy, where the confused player probably first saw the graph saying something like “12 kyu” and then saw the thumbnail say “6 kyu” and that’s why they said “that’s not what I saw initially!”


Oh, and by the way

This is an opinion that has long echoed through the halls of the forum, and I’m not sure it’s as straight-forward as it may seem.
 First of all, the idea that 6 kyu is not a good estimate for the average strength of a newcomer to the site: according to the histogram it’s actually not that far off (it’s bang on if we talk about modes, and it’s only slightly off if we talk about medians), though that might partially be caused by a self-fulfilling prophecy.
 But the rating doesn’t appear to have drifted too much since the 2021 system has been in place, so I don’t think that applies.

 One could also say that if it’s true that it’s too high an estimate, that arguably makes it a good countermeasure to alt-sandbagging – by making setting up a new sandbagging account annoying because you have to keep losing on purpose until the rank is established.
 Specifically, it might be a good countermeasure for sandbagging in the lowest ranks, the kind that afflicts beginners in self-identifying rank systems. In other words, the longer it takes for the true weakest players to reach their true rating, the more annoying it is for people who want to sandbag at those levels to get there, thus discouraging them to keep doing it.

I don’t know if that’s true, and even then I don’t know if it’s necessarily worth making it more difficult for beginners, but… it’s a point :laughing:

2 Likes

Could someone clarify what sandbagging is in a Go/OGS context? I know that in some other sports, it is deliberately playing weaker and losing rating points in less significant games, to gain an unfair advantage in more important games. Never heard of airbagging.

Other questions:

  • What is the relationship between sandbagging and rating inflation or deflation? Does one cause the other?
  • How do other Go servers handle ratings for new accounts?
  • Does OGS have many established TPK players?

While beginners should expect to lose most of their games, the process does affect their experience (and whether they continue playing). There is a huge difference between:

  • A beginner playing a proper game against a “real” 6 kyu who simply captures every single group of theirs.
  • A beginner playing a teaching game against a “real” 6 kyu.
  • A beginner playing a proper game against a 16 kyu and losing, but at least making a few living groups and territory.

Having new accounts self-identify their rank was the system when I joined in Dec. 2016. OGS provided a short rubric to aid in self-identification. That system was abandoned when the first iteration of Glicko-2 was introduced.

Sandbagging is very rarely reported, and then only in vague terms (“that guy is a lot stronger than…”) or of rank manipulators (resigning from won games, etc.). The influx of experienced players new to OGS makes it impossible to say someone is sandbagging just because they are strong. And rank manipulation is trivial compared to alt sandbagging, which is virtually undetectable by non-moderators.

Although starting players with a pseudo-rank (whether 6k or 12k) slightly limits the effect of sandbagging, it is too slight to give it consideration.

In sum, sandbagging is a separate issue. I think it is best to devise the best rating system without taking it into consideration, with one exception: talking about the win-rates of provisional accounts is an exercise in fantasy since the exact effect of sandbagging on those accounts is unknowable.

3 Likes

As always semantics gets in the way, but I use “sandbagging” as an umbrella term for “anyone who deliberately acts in a way that will cause their rank to be lower than their true ability, for whatever reason”.
 “Unintentional sandbagging” should be self-explanatory. “Airbagging” is the same as sandbagging but substitute “higher” in the place of “lower”. Usually done to boast, troll strong players, or just to get to play them.

I imagine that “alt-sandbagging”, or the practice of creating new (alt-ernative) accounts to have an easier time sandbagging, is expected to have a small inflating effect on the mid-level player base, and a deflating effect on the lower level player base. But it all depends on at what level alt-sandbagging is more common. It might just be that it’s spread out enough to cancel out and not have any net effect.

According to the 2022 histogram, about 7-8% of the “established” players are TPK. It doesn’t take long to become “established” in the rating system, though, and it might not reflect an intuitive idea, so I’m not sure that answers your question. I imagine where you were going was “are there long-time active and established TPK players that we can consult for their experience?”, the answer to which is I don’t know :laughing:

For the other question (“How do other Go servers handle ratings for new accounts?”), I just know there quite a lot of variety of systems, but hopefully somebody else can answer in detail.

3 Likes

There are two types of sandbagging: rank manipulation (people deliberately lose games, usually by resigning won games, or they weaken their rank by losing to bots) and alt creation (they rank up a new account until it is within a couple stones of their real rank, and then they abandon it and create another new account, ad infinitum). For report purposes on OGS (i.e., not speaking ethically), sandbagging only applies to ranked games.

Airbagging is when someone’s rank rises to a stronger level than their real strength. This results from undeserved wins: score cheating, successful stalling, wins given to them from rank manipulators, and computer exploits. Such wins are annulled by the mods when they are discovered, in order to preserve the integrity of the ranking system.

OGS has lots of TPK players. We used to get about 200-250 new accounts per day. During the lockdowns, this rose to about 900 per day (one day was even about 1,200 as I recall), but subsequently fell back again. I do not know how many it is now; only the mods know.

You are right about the possible differences in beginner experiences. I seem to recall that someone once proposed having two or more different “rooms” on OGS. For example, one room where a provisional player would be matched with other provisional players, and one room for players with established ranks. Players could freely choose which room they wanted to play in, thereby accommodating new players who are not beginners.

2 Likes

@espoojaram previous post
Well i will give a short answer just about beginners / rating system and sand bagging.

I don’t see the point to be worried about sandbagging beginners as this is just what the system is pushing. Go play your first games against so stronger players. So we sandbag because we don’t want sandbagging? Just absurdity.

Now as fact, we got feedback here from beginners who were suffering from our system. They basically wanted to play fair games but that’s not what they got, and even worse when they tried on their own to play a rated game with another beginner it was refused because of the difference of rating (guess the other brginner did already go through the process of successive crushes) Absurd again.

We should respect experimentation and discovery instead of bragging with our strength, and let them enjoy the game even if it feels like laziness (your words, sorry)

In my opinion a self register as beginner (not asking for a specific ranking )won’t go into that much sandbagging because well it will be so obvious when someone is not a beginner. I still have a bit of confidence in human beings.

4 Likes

Whom are you addressing? Nothing you have written here has anything to do with anything I said.

Self-identifying as a beginner at registration (i.e., with some kind of special marking aside from a rank) sounds like a great idea. I don’t recall it ever being suggested before.

1 Like

That was answering @espoojaram, but seems the link didn’t work. sorry for the annoyance.

Okay, thank you.

I gave that suggestion, i think it’s linked directly in the first thread of the compendium.

…obvious to who? On a 9x9 game, I can barely assess my opponent’s strength as a 15 kyu, and by “assess” I mean mostly “if they get an easy life&death problem wrong, I know they’re inexperienced”.

By the way, there’s another little flaw in the idea, though it’s not a big deal.

Technically, we can’t actually promise this (“fair games”), because of what I said here:

which means even in an ideal system where all beginners can self-identify and play with other self-identified beginners and there’s no sandbaggers, many beginners will be significantly weaker than the average of beginners, and they’ll have to go through the same exact rigamarole of losing games until they find the appropriate rating.

If the sandbagger problem is contained, it will usually be a faster and more painless process than starting at 6 kyu, but they’ll still get kind of stepped on.

Though this is a great counter-point:

But on the other hand, since beginners typically don’t have familiarity with etiquette, I expect the on-steppers won’t be very polite in chat while on-stepping.

But I’m probably supposed to be the one who doesn’t care about beginners, so I guess I won’t care?

Again, I’ll support the plan if you people ever stop walking in circles and actually start taking action (if anyone’s wondering, this is a jab at the fact that there are proposals for this kind of system at least as far back as 2018, and every time someone was like “but Glicko-2 requires 1500” and every time someone else correctly was like “actually no” and everybody was like “oh, let’s do it then” and here we are in December 2022).

I’m genuinely not sure what the results will (would?) be.

I did perhaps think of a way to somewhat “measure” the success of the system though: if we compile statistics about abandoned young accounts, with a self-identifying beginner system, I’d expect to see most of the alt-sandbaggers moving up in the ranks, while the accounts of discouraged beginners should either stagnate or go down in rating before being abandoned.

With our current system, discouraged beginners should be the ones that go down and then abandon, while I imagine alt-sandbaggers will win at least a few games before abandoning.

So, of course it’s not a rigorous thing, but if we measure those quantities, maybe that will give us an idea of which system is better.

3 Likes

I know you’ve already linked one example, but is there really a significant amount of 26+ kyu players? The lowest rank I ever “achieved” was 25k (in today’s rating conversion - back then, it was 40k). I knew nothing about the game except that black and white take turns, and that chains with only one liberty can be captured (although I didn’t know that this was true even if the liberty was an eye on the inside). Until the example you showed, I’d never even seen a rating significantly lower than 640!

According to the usual histogram, it’s about 2% of the current player base on OGS, though most of them are above 32k – and if the hypothesis that the current rating system discourages weak beginners is correct, there may be many more that never even get to their rank and don’t appear in the statistics. Other than this, I have no idea :laughing: