Ranking system is BROKEN

Groin · August 10, 2025, 12:50pm

Absolutely. First step is to not offer so big opportunities like these. If we get stupid rating effect, just restrain the max handicap. Myself I feel that 5 stones on a 9x9 is already out of the bounds, it’s much more as 9 on a 19x19

My suggestion IMHO:

9 stones on 19x19
5 stones on 13x13
3 stones on 9x9 Even only 2 maybe. I dunno.

Still can be abused I guess, but much less as the current state.

Samraku · August 10, 2025, 12:58pm

yeah, I’d just go with the equivalent to 19x19 9 stones (technically 8.5) as a limit for being ranked. If the 6x more powerful estimate is accurate, that gives 1 stone and 1 komi (7 komi minus 6 points of handicap above the one stone) for area scoring, and 1 stone and 1 komi (6.5 komi minus 5.5 points of handicap above the one stone) for territory scoring (these figures account for the ½ offset)

EDIT: I like that all games can be ranked no matter the gap, I’m just talking about ranking games with handicaps this large

Groin · August 10, 2025, 1:05pm

The other complementary way is to keep those weak bots for rating for beginners only. I don’t see the point to calculate Dan or SDK rating with them.

Feijoa · August 10, 2025, 2:31pm

A few thoughts.

It shouldn’t be considered an illegal exploit to set up ranked handicap games against bots, if that’s what you enjoy. Fix the ranking system instead.
From having tried this against some beginners recently, a win or loss at 9 stones on 9x9 definitely says something about the players’ strengths. It’s harder than 5 stones for sure, not sure about 6-8.
Beginners and bad bots place their handicap stones poorly, which makes it easier to beat them, so maybe free/fixed placement should be factored in to the rankings.
The effective rank effect of the handicap (52) is displayed in the game info:

image286×162 3.88 KB
Our automatic handicap system is described here - it’s less than 6 ranks per stone.
Even if it’s supposedly “ranked” in the game info, a good ranking system could make the effect of an effectively 52-handicap game very small or zero.
A good ranking system should not reward repeated wins over and over again against the same player. Even if the first game says something about their strengths, the 10th game or 100th game does not add any more information unless they both play a lot of other players in between those games. This is something that probably doesn’t come up in IRL ranking systems since you would never play 100 games against the same player at a tournament, but I think it causes a lot of issues with bot rankings at OGS.

yahel.or.3 · August 10, 2025, 6:23pm

Some people want to take advantage of the ranking to raise their level in the end it’s good for us to give us faster levels

yahel.or.3 · August 10, 2025, 6:24pm

I’m sure there are more of these people.

Samraku · August 10, 2025, 6:28pm

Personally, I’m for condoning bot exploits (which don’t rely on a website bug, of course): if you want people to not exploit your bot, build a better one. If you think this is too hard, yeah, Go is infamously hard to program for. And?

That said, that appears in my estimation to be a minority opinion here, and I think the more important consideration is getting as many human vs. human games as may be fairly ranked, ranked, and as to bots, well, people shouldn’t be playing them anyway, humans are better

yahel.or.3 · August 10, 2025, 6:33pm

Go is a difficult game and they think it’s like the game is over if they reach 9th dan. They don’t want to play after they reach 9th dan. If that’s true then that’s fine. They just know they’re not 9th dan. They weren’t 9th dan until they went to the tournament.

Samraku · August 10, 2025, 7:02pm

I’m confused. How does this connect to the thread?

mikhail.trusfus · August 10, 2025, 7:52pm

What is the point? This creates a problem for them to find even opponents.

GreenAsJade · August 11, 2025, 12:18am

I think that a bot exploit is an “exploit” if it is fully predictable.

For example if you create a game of particular parameters (say, 9 handicap) and the bot will predictably resign, then that is an exploit.

It does not serve a “ranking purpose” - ranking needs to be based on go playing skill, not the ability to discover a predictable exploit and repeatedly use it.

So:

Ranking up by repeatedly defeating a bot using Go skill: fine.
Ranking up by following a predictable recipe to make a bot loose: not fine.

This is the “bot consideration”.

Separate from this is the consideration of how much rank do you get from defeating any opponent at specific parameters.

This is the “ranking system broken?” consideration. It really has nothing to do with bots.

We can observe, from the bot games provided, that you can rank up fast to high ranks defeating a specific opponent (25k) with specific parameters (9x9 9 handicap). The question is whether this is “a correct representation of the skill involved or not”.

If I could have a wish, it would be that we would stop talking about bot exploits in a thread that is about ranking, because it confuses the issue (despite that fact that the ranking system issue was uncovered by a bot exploit).

Samraku · August 11, 2025, 12:48am

Yeah, this is the common view. I hold that if your bot can be reliably defeated through an easily replicable method that does not rely on site bugs, then it deserves to be exploited and tank its rating, just like a human who could be defeated that easily would deserve to. But I agree that this does not change the fundamental observation that something weird is going on

One aspect of this problem may be that the ratings are tied together for 9x9 and 19x19. I have long been of the view that we should fundamentally change the ratings after the model of lichess, with the most important part being to completely eliminate from both the back and and front end any “overall rating”

GreenAsJade · August 11, 2025, 12:51am

“It” deserves it, but WE can’t have that.

What would you have us do: ban bots that have exploits, till they are fixed?

Disallowing the use of exploits is not to protect the bot, it is to protect the ranking system.

I maintain that this thread should be about the specific problem which is:

“how much rating should a dan victory over a 25k at 9 handicap deliver?”

Other topics, such as “hmmm, actually how should we handle exploits” or “hey, let’s talk about overall vs specific rank” warrant their own thread - here they distract from the specific short term problem that has been identified.

Counting_Zenist · August 11, 2025, 1:09am

Does this “exploit” works between “human accounts”, let’s just say you ask a friend to just tank their account in rating to below 25k, do they need to maintain around the strength, or could they tank it all the way down to the floor, and still use this as a way to boost the ranks repeatedly with handicaps?

GreenAsJade · August 11, 2025, 1:12am

I don’t think we know.

If you ask your friend to tank their rating, and they do that, then hopefully someone notices and you both end up banned.

But aside from that, this is the exact question we’re asking: does the rating system perform as we would expect, or in a “wierd exploitable way”, at this corner of the parmeters?

It’s not even especially about exploitability. There’s the basic functional question, like this: if a 3d defeats a 25k at 9 handicap on 9x9, does that really mean that their skill is more like 4d?

It is possible that the answer is “yes - if you are currently 3d and can do this, then you should be 4d”.

That’s what the outcome we see seems to say.

It doesn’t seem intuitive though does it? As someone observed, such a game doesn’t seem to say much about the dan player at all.

Counting_Zenist · August 11, 2025, 1:34am

But it could be of legit reason rather than “rank exploit”, assume the stronger account is the teacher, and ask students to join OGS, and if they are all beginners, and all played “teaching games with handicaps” and none of them can beat the teacher. So instead of one account of dan player repeatedly beating one 25k in handicaps, but a dan teacher repeatedly beating a group of 25k students who also played other games with other beginners. If this resulted in a significant strength boost to the teacher, then the ranking system might also be problematic?

GreenAsJade · August 11, 2025, 1:49am

Teaching games are not allowed to be ranked.

Ranked games must have “no outside assistance”.

However, if the teacher beats the students in a anked handicap 9x9 game where the student is unassisted, the ranking system should maintain a correct rank for them both.

This is the exact reason why it is unintuitive that a dan should get any rating boost from such a victory.

It feels like such a victory says nothing about the Dan’s skill, so it should not impact their rating.

The only reason why I’m hedging with words like “unintuitive” is because it was observed that

… I personally have no experience with this sort of thinking, no recollection of dicussions about that.

What this tells me is that my intuition about 9x9 and handicaps can’t be trusted.

So someone may come along and explain why a dan victory over a 25k at 9 handicap is in fact a worthy thing that is supposed to give the dan a rating boost.

Personally, I doubt it, I think it’s a bug.

Specifically, it sounds like an unanticipated effect of removing the +/- 9 handicap range limitation for rated games. If that were in place we would not be having this discussion.

Counting_Zenist · August 11, 2025, 2:42am

I agree that regardless of the nature of the match between students and teacher (whether teaching games, or tests for teachers to evaluate how good students have learned, or simply teachers might just play a serious game, to see how much students have improved overall, and see if they need to adjust the handicaps) shouldn’t boost the teacher’s rank at all. Most teachers’ ranks after teaching for a while would be pretty stable (maybe at preparing teaching materials, can solidify some knowledge, and better organize their thoughts, but when it becomes a system, they pretty much just stick to it, and mostly nothing to do with playing/testing with students, they learned how students’ common back moves and their bad habits more than anything).

Or it can be even simpler as a club in a place with few Go players in the area, and one strong player from CJKT moved there, and they want to hold an online club meeting/tournaments here on OGS, with one super strong player and the rest as weak players, and they simply need handicap stones to make it work. And chances are the strong CJKT would beat everyone even with the handicaps.

A_Normal_Name · August 11, 2025, 3:37am

Using OGS’s built-in rating calculator, it showes that a 2000 rated (1.9D) player should only gain very little rating points even when winning with 9 handicaps.

Unless if I’m missing something, that is not merely enough to rank up rapidly.

GreenAsJade · August 11, 2025, 3:45am

Nice one!