Ranking system is BROKEN

Eugene · August 10, 2025, 10:50am

Great find. Obvisously, I’ve been oblivious to that.

In which case, during the process of this change (or simply as an effect of it) the ranking formula is clearly broken. @anoek

Samraku · August 10, 2025, 10:53am

I’ve seen some suggestions in other threads that it appears OGS is still counting handicap ½ stone wrong (a “1-stone handicap” should be treated as a ½ stone handicap by all parts of the system that actually matter, since it undoes half the handicap by removing komi)? Would this be a large enough effect to cause this?

Eugene · August 10, 2025, 10:55am

Doesn’t seem like it: here we see a person leaping up through Dan ranks by defeating a 25k playing at an effective rank of 16k (9 handicap) … a Dan should still only get a minute increment from such a victory.

Samraku · August 10, 2025, 10:57am

When we say “remove ±9 rank restrictions”, we don’t mean “allow greater than ±9 handicap”, right? because I trust the rating system to handle the former, but not the latter

Samraku · August 10, 2025, 10:59am

well, on 9x9, isn’t our current estimate that handicaps are 6x stronger? so 16k-9d isn’t 24 ungiven handicaps apart, it’s 4

Eugene · August 10, 2025, 11:10am

I’m pretty sure we mean the former.

Hah, well I’m totally out of my depth now, especially if that is true.

Maybe it is actually OK! Maybe if a 3d defeats a 25k with 9 handi on a 9x9 board, that really is worth a dan of rank!

This doesn’t match my intuition, but I’m totally aware my intuition is tuned to 19x19.

If it’s true, then there is nothing broken. Just some people abusing bots that need their games annulled!

shinuito · August 10, 2025, 11:55am

I don’t know, that sounds like a massive problem, especially for the rating system combining ranks on 9x9 and 19x19.

It sounds quite a large extreme, on what would otherwise be maybe an ok approximation.

Eugene · August 10, 2025, 11:58am

Wouldn’t it only be a problem if it were even a possible outcome?

My statement was based on the assumption that “it is so hard to win at 9x9 with 9 handicap that in effect this huge theoretical rank boost would never happen”.

(Or if it did, then the winner would have demonstrated their prowess suitably).

I dunno - it’s probably beyond speculation now: the maths needs to be debugged.

smog山人 · August 10, 2025, 12:01pm

my favourite is looking at royalleela’s losses.. idk at what level a bot is considered superhuman in strength, so i guess when it loses it can be argued to lose to a strong human.. (example; 친선 대국)

as for ranks being broken or not.. i think it’s very difficult to tell whether ranked bots have a negative or positive impact on peoples ranks, since ranked bots were implemented on the same day the ranking system was fundamentally changed from elo to glicko.. and since a vast majority of ogs games are not live 19x19 games, I have no idea what is the value of any given ogs rank without inspecting the users history in detail (and also branching out to the opponents they played)..

Groin · August 10, 2025, 12:05pm

That’s the problem. It’s absolutely not that difficult when the bot is so dumb

Eugene · August 10, 2025, 12:07pm

Yes - but abusing dumb bots is a different problem with an easy solution: report it and it gets fixed.

The remaining discussion is around whether outside of that case, there is any problem.

Groin · August 10, 2025, 12:12pm

It get fixed case after case, not in a global way. It’s a trick still available and you don’t need to be so strong to elaborate a strategy on how to use it.

shinuito · August 10, 2025, 12:13pm

Maybe it won’t happen for normal matchups between players that know how to play Go.

But is that because of the players skill at Go, or is it purely a limitation of the board size? Playing with 9 stones on a 9x9 board, might be more like playing with 25+ stones on a 19x19 board. Not to mention that the chinese rules bots use for playing don’t have a standardised placement for those 9 stones, so you can’t really guarantee consistency with those handicap stone placements.

Combining 9x9, 13x13 and 19x19 ranks into one makes sense if, regardless of what board you’re playing on, your general skill at Go correlates well enough to be useful to predict wins on other board sizes.

I think reading, life and death, endgame, tesuji and so on don’t care too much (except for some special cases) what board size they’re on. So training these and playing on a 9x9 or on a 19x19 should in theory still be a way to improve overall.

We’re in a situation though where

did the winner really demonstrate anything? Or was the loser (25kyu bot) actually be the one that really demonstrated something about their ability or knowledge.

I don’t think the skill of winning against 9 stones on a 9x9 board says anything about the strong player, but says something about what the weaker player still needs to learn about Go.

Groin · August 10, 2025, 12:14pm

Did you look at the games?

Eugene · August 10, 2025, 12:15pm

I feel like “hmm… whatever… you make a persuasive argument, and yet … still really we need the maths investigated in detail: intuition and speculation can’t take us further”

Eugene · August 10, 2025, 12:16pm

It’s earlier in the thread. They were demonstrated to be bot-abuse … IIUC!

Groin · August 10, 2025, 12:20pm

Bot abuse: how good is supposed a 25k bot to be?

I don’t want to encourage any bot abuse. But it seems quite a debate if a relatively weak player reach to elaborate wins with 9 stones on a 9x9 against too dumb bots. Of course he could get some doubts to get ranked soon as a very high Dan player.

shinuito · August 10, 2025, 12:25pm

Yes I agree.

I don’t even mean to make a persuasive argument just for the sake of it.

What I mean is suppose something like

Let’s suppose we’ve done some extrapolation of what handicap should look like on 9x9 (it did get revamped at one point), should it have a similar cutoff the way 19x19 does on handicap stones.

We have a cutoff of 9 stones on 19x19, partly because that’s where the standard placements end.

It’s probably the case that ranked games should probably be cutoff after about 5 stones handicap on 9x9. I don’t mean whatever the number 5 translates to (maybe it’s 2 stones and 4.5 komi), I mean 5 physical stones on the board.

There’s a good chance that when a player knows how to play Go, they can beat strong enough bots when they have 5 stones handicap on the board in 9x9, just connect your stones together and there’s not enough room to live for white.

For having more than 9 stones on 19x19 maybe you can argue that it starts getting progressively harder to win by demonstrating some skill as a strong player as opposed to just general lack of knowledge by the weaker player. If you never learned to close the borders of territory, is that going to tell us the difference between a 1d and 2d beating someone after giving 40 stones handicap? Probably not.

If you had 25 or 30 handicap stones on 19x19 and lost, it could’ve been because you didn’t count the board and the strong player did, should that be worth some sizeable amount of rating points?

Groin · August 10, 2025, 12:32pm

Common sense like no more as 9 stones on 19x19 should be enough to fix a limit. We don’t have to explore deeper. There are other limitations for what a game is legit to enter the rating system, like standard size, komi… So a simple first step is to follow the uses here too.

For 9x9, and 13x13 I dunno if there are some common uses.

shinuito · August 10, 2025, 12:39pm

But is it bot abuse if the bot just doesn’t close its borders?

At what point does it transition from playing the bot normally to abuse?

The fact that’s it’s ranked at all is probably the real issue, and the real thing that’s transitioning it from ordinary playing to seeming like abuse.

Example game:

Normal-ish game

Except the bot doesn’t close its borders and fills in its territory or adds some dead stones.

But this is what I’m talking about. The games “exploited” were typically 9x9 games with 9 stones from what I can see.

I think something like 5 physical stones is roughly where people will not want to give more hanidcap stones on 9x9, except against complete beginners.

If we look at what they are exploiting? It wasn’t like a ladder or something, the bot is just bad, but playing a bad bot isn’t an issue is it? It’s that they’re playing a ranked game with the bot at high handicap that seems like the real issue.

So it is worth looking at whether we need more sensible cutoffs on say 9x9 or other boards, whether there’s some bug in the code, and not just chalk it down to exploiting the bot.