Are OGS rankings inflated, deflated or neither?

Maharani · December 28, 2022, 7:52pm

Do you think it is reasonable for people who know nothing about go to lose 90 % of their games in the beginning?

Yes, this is to be expected; no need for change
No, it is humiliating; we should rebuild the ranking system so complete beginners can win 50 % of their games from the start
I don’t have a strong opinion on the matter

0 voters

espoojaram · December 28, 2022, 7:57pm

I can’t answer the poll

I believe it might be humiliating, but at the same time it’s idealistic to imagine a system where beginners (or weak players) win 50% of the games from the get-go.

If beginners can self-identify without vetting, sandbaggers will self-identify as beginners. If we start players with a low rank, sandbaggers don’t even need to do anything.

Unless we have a solution, I can’t answer 1.

Conrad_Melville · December 28, 2022, 7:57pm

None of this has any relevance to my comment. My only point was you seem to have misunderstood the saying. I wouldn’t even care except that you criticized it for being a myth. You also seem to miss the kind of “gallows humor” quality of it, which I guess is understandable with a foreign language.

It is pretty obvious that discouragement is a serious problem in any difficult game (or sport). I think few people would disagree with that. Any other truisms you want to debate?

Maharani · December 28, 2022, 7:58pm

Btw wow, is the discrepancy really that enormous? dwyrin is 7d on Foxy and easily beats 3d on OGS without even remarking that they seem stronger than Fox 3d.

Conrad_Melville · December 28, 2022, 8:02pm

I, too, can’t answer because it mixes together what we expect and what we want. And, of course, it depends entirely on whom they are playing.

Groin · December 28, 2022, 8:06pm

7d on fox is very different as 5d on fox

Maharani · December 28, 2022, 8:10pm

I intentionally worded the poll that way because if you do believe that losing 90 % of your games as a complete beginner is humiliating and unacceptable, you’d have to devise yet another complete overhaul of the ranking system. Otherwise, why complain?

Allerleirauh · December 28, 2022, 8:11pm

Well, when I was 1d OGS I managed to get 5d Foxy but I’m not sure if I could hold it, I kinda got bored there. But 1d OGS is solid 4d Foxy, 2d OGS is surely 5d Foxy.

But on Foxy ranks are jammed. You have like 9d pros and AIs. Some weaker pros even got knocked down to 8d. And you have 1d amateurs at like 4-5d. So jumps in difficulty with 6-7d are big.

Groin · December 28, 2022, 8:13pm

It’s not difficult for me to imagine a system where beginners can play rated games together and so, have a 50% chance of win.

espoojaram · December 28, 2022, 8:15pm

I didn’t say it was difficult, I just said it was idealistic, but whatever. How would you solve the sandbagger problem?

Don’t you expect the increased presence of sandbaggers at the lower ranks to make the true beginner’s experience worse?

Maharani · December 28, 2022, 8:16pm

So how would you achieve that?

Groin · December 28, 2022, 8:22pm

Well, that has been debated a lot in the threads i mentioned before. Want me to write all again?

One thing to mention is that it’s completly useless to give a unique entrance point to the gliko system to work.
Besides a beginner could enter as beginner, even no need to set a ranking on his own, and get what he wants.

Anyway please read the debates which already happened and which are not out of date (unless you find some new ones, and i would be pleased to update the compendium)

Uberdude · December 29, 2022, 6:45am

It’s the OGS version of Godwin’s law: all conversations about ratings eventually end up about the terrible system for beginner ranks!

meowkorkor · December 29, 2022, 11:59am

Which is a special case of rating inflation and its impact (my intended topic), thus relevant.

espoojaram · December 29, 2022, 11:59am

Well, you gave me quite a bit of homework there, but I think I’m mostly caught up now.

I believe the only place in all of those links where the sandbagger problem is discussed extensively is in this topic, starting from about this reply, and my personal verdict is

no good solution has been proposed to the sandbagger problem.

I would agree that the system has changed a lot in the recent years and that testing would be required to verify or falsify this following hypothesis, but:

the most likely scenario of letting beginners self-identify is that the situation would revert back to the old landscape, where beginners get stuck at the lowest ranks unable to ever get up, because that’s where all the sandbaggers are, and moderators have a huge burden of having to deal with all the reports of intentional and unintentional sandbaggers and airbaggers.
Most sandbaggers at the lowest ranks would usually not get caught because beginners usually don’t have the experience needed to recognize a sandbagger and report them. And “getting stuck at the lowest ranks” would be psychologically awful for the beginners not because of purely cosmetic reasons of them wanting to rank up, but simply because they’re constantly being stepped on by sandbaggers in the process.

This is the best counter-point I found:

Intuitively, this sounds naive to me, for the reason mentioned above: if you get stuck as a beginners with all the sandbaggers, you are already getting stepped on. Except in the current system you only get stepped on for at most 20 games if you’re extremely weak, and less than 6 games in the majority of cases.

So I have to say, the current system might actually be the best case scenario in the interest of the beginners themselves, excluding scenarios where, say, we have the resources to have trusted users manually vet each and every self-identified beginner, or direct beginners to some kind of tutorial instead of letting them play.

(Instead, one group of people that is undoubtedly burdened with the current system are the established SDKs, who have to deal with all the new accounts, including true weak players who often don’t understand etiquette, fake weak players who are setting up to sandbag, alt-sandbaggers, and strong players essentially sandbagging unintentionally.)

But then again, my hypothesis might be wrong and letting beginners self-identify might lead to a better situation.

I guess this is the point where we need a forum-wide poll to let people vote on whether it’s worth it to try out that system. Though we do also need to agree on the details of the system.
(Well, about one year ago was the point to do it, apparently, but better late than never)

Before somebody comes along and chants the old adage that "there's no empathy for the beginners", (click to expand)

I want to clarify that I (believe I) do empathize with the beginners, and my personal view, perhaps a very unpopular opinion, is that I don’t even believe letting beginners play with one another without supervision is usually the best way for them to learn, and I feel it’s kind of “the lazy way” to do it instead.

Especially in such an uncontrolled environment as a worldwide public online platform, it’s absolutely unpredictable what their experience will be, as every person is different, both in character and in technical proclivity.
Some beginners are somehow 20-25 kyu from the start, some are closer to 35-40 kyu. Some come here with an understanding that losing often means learning, some of them want to conquer the world and get soul-crushed by the perception of being average or even weaker than average.
The beginners who are technically or psychologically weaker especially need to be handheld one way or another. Even an automated tutorial is better than just throwing them in the pit, if we don’t have the resources to actually give them a teacher or someone to supervise their games.

That’s what I believe, and that’s why I believe changing the rating system to let beginners self-identify is not necessarily the main priority.

Letting beginners play each other might be the most efficient way to have them learn, but nowhere near the most effective, and if those games are not supervised, I personally don’t believe it’s likely to do more good than harm.

To be clear, my personal position is that I’d like to see the experiment happen: I’d like for OGS to try out a system where beginners are allowed to self-identify.

Even though I personally believe it will lead to chaos, we need the test to know for sure.

I believe a combination of this proposal and this proposal, with a few tweaks and polish, is a good starting point.

To conclude, I want to get back to this thread ( I guess in the future I might have to copy-paste this reply somewhere else, but I didn’t really know where to put it) and point out that there’s at least one thing we can probably all agree on, and it’s that (with the current system) letting users see the estimated rank of provisional players, in thumbnails and in the profile page graph, leads to a lot of confusion.

At the very least the thumbnails should not display the rank without question mark, and they should just display a question mark.

I personally believe the graphs should be kinda overhauled to be more readable for science/maths/statistics laymen, and I do have specific ideas on how, but I don’t know if anyone is gonna listen to me, so unless someone expresses interest, I won’t bother creating the concept graphics.

If the graphs can’t be overhauled that way, at least the current rank estimate should be hidden everywhere until it’s somewhat established, to avoid confusions like this one

Which, I believe is the most likely explanation, happened because of the thumbnail/“humble rank” discrepancy, where the confused player probably first saw the graph saying something like “12 kyu” and then saw the thumbnail say “6 kyu” and that’s why they said “that’s not what I saw initially!”

Oh, and by the way

This is an opinion that has long echoed through the halls of the forum, and I’m not sure it’s as straight-forward as it may seem.
First of all, the idea that 6 kyu is not a good estimate for the average strength of a newcomer to the site: according to the histogram it’s actually not that far off (it’s bang on if we talk about modes, and it’s only slightly off if we talk about medians), though that might partially be caused by a self-fulfilling prophecy.
But the rating doesn’t appear to have drifted too much since the 2021 system has been in place, so I don’t think that applies.

One could also say that if it’s true that it’s too high an estimate, that arguably makes it a good countermeasure to alt-sandbagging – by making setting up a new sandbagging account annoying because you have to keep losing on purpose until the rank is established.
Specifically, it might be a good countermeasure for sandbagging in the lowest ranks, the kind that afflicts beginners in self-identifying rank systems. In other words, the longer it takes for the true weakest players to reach their true rating, the more annoying it is for people who want to sandbag at those levels to get there, thus discouraging them to keep doing it.

I don’t know if that’s true, and even then I don’t know if it’s necessarily worth making it more difficult for beginners, but… it’s a point

meowkorkor · December 29, 2022, 1:48pm

Could someone clarify what sandbagging is in a Go/OGS context? I know that in some other sports, it is deliberately playing weaker and losing rating points in less significant games, to gain an unfair advantage in more important games. Never heard of airbagging.

Other questions:

What is the relationship between sandbagging and rating inflation or deflation? Does one cause the other?
How do other Go servers handle ratings for new accounts?
Does OGS have many established TPK players?

While beginners should expect to lose most of their games, the process does affect their experience (and whether they continue playing). There is a huge difference between:

A beginner playing a proper game against a “real” 6 kyu who simply captures every single group of theirs.
A beginner playing a teaching game against a “real” 6 kyu.
A beginner playing a proper game against a 16 kyu and losing, but at least making a few living groups and territory.

Conrad_Melville · December 29, 2022, 1:55pm

Having new accounts self-identify their rank was the system when I joined in Dec. 2016. OGS provided a short rubric to aid in self-identification. That system was abandoned when the first iteration of Glicko-2 was introduced.

Sandbagging is very rarely reported, and then only in vague terms (“that guy is a lot stronger than…”) or of rank manipulators (resigning from won games, etc.). The influx of experienced players new to OGS makes it impossible to say someone is sandbagging just because they are strong. And rank manipulation is trivial compared to alt sandbagging, which is virtually undetectable by non-moderators.

Although starting players with a pseudo-rank (whether 6k or 12k) slightly limits the effect of sandbagging, it is too slight to give it consideration.

In sum, sandbagging is a separate issue. I think it is best to devise the best rating system without taking it into consideration, with one exception: talking about the win-rates of provisional accounts is an exercise in fantasy since the exact effect of sandbagging on those accounts is unknowable.

espoojaram · December 29, 2022, 2:18pm

As always semantics gets in the way, but I use “sandbagging” as an umbrella term for “anyone who deliberately acts in a way that will cause their rank to be lower than their true ability, for whatever reason”.
“Unintentional sandbagging” should be self-explanatory. “Airbagging” is the same as sandbagging but substitute “higher” in the place of “lower”. Usually done to boast, troll strong players, or just to get to play them.

I imagine that “alt-sandbagging”, or the practice of creating new (alt-ernative) accounts to have an easier time sandbagging, is expected to have a small inflating effect on the mid-level player base, and a deflating effect on the lower level player base. But it all depends on at what level alt-sandbagging is more common. It might just be that it’s spread out enough to cancel out and not have any net effect.

According to the 2022 histogram, about 7-8% of the “established” players are TPK. It doesn’t take long to become “established” in the rating system, though, and it might not reflect an intuitive idea, so I’m not sure that answers your question. I imagine where you were going was “are there long-time active and established TPK players that we can consult for their experience?”, the answer to which is I don’t know

For the other question (“How do other Go servers handle ratings for new accounts?”), I just know there quite a lot of variety of systems, but hopefully somebody else can answer in detail.

Conrad_Melville · December 29, 2022, 2:19pm

There are two types of sandbagging: rank manipulation (people deliberately lose games, usually by resigning won games, or they weaken their rank by losing to bots) and alt creation (they rank up a new account until it is within a couple stones of their real rank, and then they abandon it and create another new account, ad infinitum). For report purposes on OGS (i.e., not speaking ethically), sandbagging only applies to ranked games.

Airbagging is when someone’s rank rises to a stronger level than their real strength. This results from undeserved wins: score cheating, successful stalling, wins given to them from rank manipulators, and computer exploits. Such wins are annulled by the mods when they are discovered, in order to preserve the integrity of the ranking system.

OGS has lots of TPK players. We used to get about 200-250 new accounts per day. During the lockdowns, this rose to about 900 per day (one day was even about 1,200 as I recall), but subsequently fell back again. I do not know how many it is now; only the mods know.

You are right about the possible differences in beginner experiences. I seem to recall that someone once proposed having two or more different “rooms” on OGS. For example, one room where a provisional player would be matched with other provisional players, and one room for players with established ranks. Players could freely choose which room they wanted to play in, thereby accommodating new players who are not beginners.

Groin · December 29, 2022, 2:54pm

@espoojaram previous post
Well i will give a short answer just about beginners / rating system and sand bagging.

I don’t see the point to be worried about sandbagging beginners as this is just what the system is pushing. Go play your first games against so stronger players. So we sandbag because we don’t want sandbagging? Just absurdity.

Now as fact, we got feedback here from beginners who were suffering from our system. They basically wanted to play fair games but that’s not what they got, and even worse when they tried on their own to play a rated game with another beginner it was refused because of the difference of rating (guess the other brginner did already go through the process of successive crushes) Absurd again.

We should respect experimentation and discovery instead of bragging with our strength, and let them enjoy the game even if it feels like laziness (your words, sorry)

In my opinion a self register as beginner (not asking for a specific ranking )won’t go into that much sandbagging because well it will be so obvious when someone is not a beginner. I still have a bit of confidence in human beings.