Separate ratings between board sizes

shamisen · March 3, 2026, 6:27am

I want to combine these two concerns to fix a problem with OGS ratings:

I didn’t think @prokyu ‘s suggestion was needed at first, but after seeing someone abuse 9x9 handicap games against drunken master bot , bringing down the bot’s rank to 3 kyu when it should be at least 3 dan in 19x19 strength, I changed my mind:

Collectively, bots are essentially even bigger sandbaggers ( Sandbagging at Sensei's Library ) than some players who maliciously sandbag once in a while.

I think prokyu’s solution to isolate ratings between game types has merit now. It would help human players play vs. other human players too. Some players already feel the need to make separate accounts for 19x19 and 9x9. The only additional measure we might need to implement to make the rating separation more fair is to use overall rating ONLY when the player doesn’t have an established rating for a certain game mode yet. For example, if I’m 3 dan in 19x19 but unrated in 9x9, the first rated 9x9 game I play should consider me as close to 3 dan in strength.

Another reason we need to separate 9x9 from 19x19 is that 9x9 is in some sense partially solved: KataGo 9x9 Opening Books update - 2026 version Some players may have memorized it. The set of skills needed for 9x9 is vastly different from the strategy in 19x19, even if they share a lot of the same tactics.

I don’t think we need static bot ranks. I also don’t think separate time controls need separate ratings.

square.defender · March 3, 2026, 8:55am

if ogs going to return to separate ranks, existence of overall rank will confuse everyone, more clear and simple design would be to have separate ranks only.

but custom “other” board sizes would have a problem

square.defender · March 3, 2026, 9:03am

If someone played first 100 games as 25k on 9x9, then played 1000 19x19 games and got dan, then they would have problems with 9x9 rank. They would be like sandbagger. And it would take a lot of time to fix 9x9 rank.

shamisen · March 3, 2026, 10:16am

Then make old ranks more volatile (higher plus-minus) as time passes without playing a single game for a specific board size. OGS ratings get stronger (or weaker) every few years anyways – I see people with 9 dan accounts who would never be 9 dan in the current ecosystem probably refusing to play games because they know they’ll lose their 9 dan rating.

Even without this fix, I can’t imagine returning to separate ranks being much worse than the current state. Ratings for 9x9 should never be mixed with 19x19 under ANY circumstances. They are almost separate board games. Of course getting stronger in one would lead to getting slightly stronger than the other, but the same can be said about switching between Chess and Go. Players who are good at one board game usually reach a similar ELO rating in the other game without too much effort, from what I’ve seen.

I agree with this. The only loss would be the lack of a constant number for bot ratings, but that’s nothing. The webpage UI should be able to update the rating of bots depending on which board size is selected.

shamisen · March 3, 2026, 10:44am

The decision to merge all the ratings was apparently made by a ~15 kyu player with this logic:

I don’t think that’s a valid opinion at all.

There’s also this option from an old post if all else fails. I don’t think OGS needs to do this though:

Any worries about UI inconsistencies can be fixed with better UI design or frontend development. At the core ideologically, it simply doesn’t make sense to equate 9x9 rating to 19x19 rating. I’ve always been weaker by a few stones at 9x9 than 19x19 for at least 10 years, because I never cared to study corner life-and-death situations or 9x9 theory. And for a long time on OGS, I was 3-4 dan at 13x13 but it took me a lot of effort to reach 3 dan or even maintain 2 dan on 19x19, because the opening I used for 13x13 was not as effective for 19x19 and I didn’t know how to use influence or how to evaluate who is ahead/behind in points on the bigger board. It gave me a mild case of imposter syndrome.

shamisen · March 3, 2026, 10:56am

After some discussion and thought, I realize it might take a lot of UI work to make this change. So I understand if the suggestion is ignored for the time being. There are probably more important things to worry about.

Sadaharu · March 3, 2026, 10:59am

I agree with this because the 9x9 games make my ratings drop a lot

JethOrensin · March 3, 2026, 11:22am

Something that should be considered is that the rank inconsistency also makes it hard to find a game to practice on a certain board size or time setting.

I did an experiment yesterday and played against bots for the first time, I put a lot of time in the clock mostly for the bot and decided that I had only 5 seconds per move on my end arbitrarily. Lost most of the games obviously due to silly mistakes and my rank in “Live games” fell from 8.4k to 9.4k.
My rank in the correspondence 19x19 where I obviously make less mistakes is 1.3k.
The combined rank according to the server is 4k.

Now, if I try to find a game against a human player in a “Live game” I should locate someone around 8-9k to play against, however most such players would see the “4k” and either avoid me or, even worse, they will think that I am there to pray on weaker players’ ranks. Such a thing makes both sides unhappy I think.

An easy fix on that - in the already existing separate ratings - which wouldn’t need any UI changes is that the ranked displayed on the interface when looking for (or playing) a game would be the corresponding rank of the particular board size and time setting, not the “general rank”.

shinuito · March 3, 2026, 2:27pm

Maybe look a bit further.

The most recent decision I remember was

2020 Rating and rank tweaks and analysis

From this data there’s two things to note:

Using a combined rating works quite well, certainly comparable or better than looking at per-size strengths by themselves. It seems to me like it makes sense to keep using it.

Using overall ratings to predict 9x9 games works pretty good at HC 0 and at HC 1, indicating to me that the strength bands are pretty compatible with 19x19 or just “go ranks” in general. However, going beyond HC 1, predictions start to get bad pretty quick. I believe this is an indication that the “Old Japanese Recommendation” is not so great for us, and that we should strongly consider figuring out what the best 9x9 (and probably 13x13) handicap setup should be.

EDIT: The question arose about considering blitz vs live vs correspondence ranks, here’s the data from that, which I believe is still very supportive of using an overall rank for picking handicap.

It’s not just a random decision, there’s data to support that it works (or worked) just as well or better than the individual ratings at predicting who would win in a matchup.

Now I think it can be fair to ask for another up to date analysis, or to suggest other methods of testing whether the overall rank works better than then individual. That seems reasonable to me.

I don’t think this is very accurate. It wouldn’t take much effort to get a 9x9 rank to match your 19x19 rank because they’re very similar games, they have very similar tactics, reading, life and death, endgame, tesuji. You’ll have to adjust things slightly for the board size sure, but the skill set is very similar.

Chess is not very like Go at all, and I’m pretty sure it would take a lot of effort for people to match their Go rating in Chess unless they weren’t that highly rated.

shinuito · March 3, 2026, 2:42pm

In fact I would say these are not valid reasons to upend a rating system.

Basically “I’m really good at this specific thing, so we should change the rating system so that it impacts me less”.

It’s being presented like a top 1% problem, but we’re talking about changing a system for the other 99% as well.

On the other hand, from the OP, I think bot ratings might be phased out sometime in the future, or work based on a separate system. I believe the default settings have been changed so that new challenges are unranked, so you have to choose to make them ranked (doesn’t fix the issue, but it’s a small change).

Sadaharu · March 3, 2026, 2:47pm

I mean, if chess can have different rating for the speed which the same game is played, I don’t see how Go can’t have a different rating for different board sizes.

shinuito · March 3, 2026, 2:50pm

If chess can have a checkmate, I don’t see why Go can’t have a checkmate?

Bullet chess with no increment can be a thing, but 1 min each with no increment in Go is just a click fest because there is no checkmate. If I click faster I will win.

Some things can work well for some games and not others.

I think we could have separate ratings for board size and time setting, it’s possible it could work well or better, but if we end up with data that says overall rank is working just as well, should we just change things because chess does it differently?

shamisen · March 3, 2026, 6:54pm

You can find plenty of posts throughout the years of others complaining about inaccurate bot ranks. This isn’t a personal complaint. Beginners of all ranks can feel discouraged after losing to a bot, and I think it’s in an odd way psychologically worse than losing to a human player because bot ranks feel more authoritative if coming from other or older servers where bot ranks are more stable.

but the quote you mentioned which references such data draws such vague conclusions. And “Using overall ratings to predict 9x9 games works pretty good at HC 0 and at HC 1” is not even very relevant to this discussion – I could tell you the same thing without seeing any data.

I guess I’ll consider making my own Go server at some point. I don’t want to argue and bother the devs here.

shinuito · March 3, 2026, 6:59pm

I didn’t disagree about bot ranks. I said bots are even likely to be made unranked in future. There is a lot of issues with players abusing bots in different ways.

You can look at the data yourself and draw your own conclusions or suggest tests which might show otherwise.

The tests seem to suggest that for low handicap (which I think is still very likely the most common matchup) the overall rank seems to predict winrates as well as individual ranks.

There could be other types of tests that could show different, maybe we could list them.

qnpnpmqppnp · March 3, 2026, 9:55pm

Yes, when reading the beginning of this thread this sounded like the obvious solution rather than changing the ranking system.

Not saying it’s trivial either, but players abusing bots’ weaknesses to artificially change their ranking is not new and wouldn’t disappear by splitting the rankings.

PRHG · March 3, 2026, 9:57pm

I do wonder if there could be some unwanted behaviour that, when looked at in aggregate, give nice averages.

It would be interesting to look at individuals. Like, take the set of individuals who have a reasonably established rank on two different board sizes, then see how many of those people have their wins better predicted by the overall rating, and how many have their wins better predicted by individual board sizes.

Maybe it would be interesting to see a tighter deviation limit than 120? It might also be interesting to only look at games after the new handicap & komi values started to be used for 9x9 and 13x13 boards.

shamisen · March 4, 2026, 6:23am

The problem wouldn’t disappear, but it would be mitigated. And when it comes to balancing a rating system and user retention, mitigation can go a long way.

Making bot games unrated is the easiest solution, but it’s not the best. Bot games can be useful to quickly get a rating on a new account at the very least.

jlt · March 4, 2026, 6:35am

The best would be to allow ranked games against bots only for provisional players, and only with low handicap. But then the difficulty is that if bots only play [?] players, perhaps bot ranks won’t be estimated correctly.

Groin · March 4, 2026, 8:50am

Since the origin of OGS it seems to me that the general policy is to integrate all kind of rated games on usual sizes. Separate or selective rating process would be a revolution.

or would be estimated correctly?

shinuito · March 4, 2026, 9:03am

Right as in it’s doing poorly to predict individual games but when grouped together it’s working out somewhat fine.

I think I’ve seen that capability in the ratings code to track individuals ratings if you make any tweaks to the rating system.

And it makes sense to look at updated data sure, and still 2 years of data since then.

It could just be that you do a calculation to estimate the bots rank and freeze it, or take a middle value of sorts if it’s past rankings before making future games unranked. Though you still need a way to rank new bots.