Yet another ratings thread

Yes. In the case that Black is 10k, White is 9k, and they play handicap 1 (no extra stones, Komi 0.5), Black is seen as ~9.5k, so White is still expected to win more.

In reality (based on data) white does win more often. This fits reality because an extra stone is valued as 12 or 13 points, so a Komi reduction of 6 isn’t enough to overcome an entire rank.

One rank is still one stone, but the first handicap doesn’t add a stone and that where this oddity comes from. If you want to change this, you simply have to change that OGS handicap 1 actually adds an extra stone to black and no change to Komi.

The formula that modifies the input to Glicko takes both Komi and stones into account, so every combination of these are treated as unique. And then there are additional modifiers for rule sets and Boardsize.

https://raw.githubusercontent.com/online-go/goratings/master/RatingsV6.md

In short:
effective player rank adjustment = ( perfect Komi - actual Komi + stone value * actual extra stones ) / 12

Where perfect Komi and stone value depend on the ruleset.

In the example above it was
(6-0.5+12*0)/12=0.46

I recommend taking a look at the document for the way OGS treats handicaps, because it is not Glicko2 or ELO default.

1 Like

Look at it that way: Ranks are mappings from Glicko2 ratings to figure out which handicap is needed to even the board between players of different strength (i.e. rating).

If the relation between rating and rank is non-linear due to that, that’s just the way it is.

Note: I’m not entirely sure this is the sole proclaimed purpose of the ranking system (not rating system) though.

3 Likes

This is generally the idea behind handicaps and ranks but not true in the sense how it’s actually implemented in OGS.

If players of one rank difference (my example of 10k and 9k above) apply what OGS calls handicap 1, then the human expectation is that the game will be even.

Statistically this isn’t true, because the first handicap doesn’t give a stone (worth about 12 points), but “merely” 6 Komi. (half a stone).

This Handicap 1 (no extra stone, Komi 0,5) does not overcome a rank, and as SouthernGoPlay remarks: white is still expected to win in the situation 10k vs 9k.

However, perhaps unexpected is that this doesn’t lead to issues in the ratings. This unequal win chance is correctly implemented in the goratings/glicko.

So even though the handicap isn’t creating the (human) expectation of a fair game, glicko knows the game isn’t fair and updates ratings correctly.

1 Like

It is true, as there’s a difference between figuring out what handicap would be needed to even the board and actually applying that handicap. And I never claimed the latter is done (because it isn’t).

And as komi is worth half a stone, I’d say a stone is worth 14 points. 6.5 komi slightly favors black, 7.5 white.

2 Likes

The 1 point Komi difference between Territory (Japanese) and Area (Chinese) scoring is because of the difference in counting rules. In Area rules Black gets an opportunity to place the last scoring stone so the komi is higher by a point, with the standard half point tie breaker to avoid draws. With their respective komi each set of rules favours White/Black by the least possible amount according to AI (Before the first move AIs seem to think White has about a 6% or 8% advantage in both rule sets, much less than a point).

On OGS pressing AI estimate on an empty board give White by 0.2 (stones) in Japanese rules. It gives White by 0.9 (stones) in Chinese (I had thought this was less), though this may depend on the strength of the AI.

2 Likes

While I’m not saying it necessarily ought to be changed, it’s definitely not correct. There is simply no way to adjust ratings so that handicap games are rated with the same meaning as even games. This is because losing (by a less than komi margin) against a 1 rank difference adjusts ratings in the opposite direction to winning against the same rank in an even game. Inside Glicko2 rating adjustment formulas it’s not possible to adjust the ranks so the result is effectively reversed. The implication of this is that OGS handicap game Glicko2 adjustments are over-rewarding/under-punishing handicap, at least slightly.

You could fully correctly rate only the handicap games where White wins, or the ones where the Black score exceeds the AI handicap margin of the initial position (and the game goes to counting), maybe, though that sounds worse than as it stands as a rule.

Even if the over-rewarding/under-punishing was found to be a source of systematic deflation (at certain ranks where the rates of giving/accepting handicap change), you could also periodically adjust ratings/just live with it/minimize it by reducing Glicko2 adjustments for non-standard game settings (the equivalent of increasing uncertainty of the ratings for these games).

In my opinion the actual cause of OGS very thin dan ranks (e.g a lot of what is called deflation) is the built in exponential rating to rank conversion. Players with 6 dan skill are rare to begin with but OGS enforces they beat 5 dan skill players 61.6% percent, not at a uniform across all ranks 58.5% kind of rate and so on down the ranks, so the dan ranks get very thin with players on OGS.

It’s a normative decision.
In System A 6 dans should be able to beat 5 dans with the same percentage as 5 dans beat 4 dans (or 25 kyus beat 26 kyus) in games without handicap.
In System B proper handicap (black starts with an additional stone, white gets the same komi as in even games) gives an average black 5 dan (x dan) a 50 % chance to win against an average white 6 dan (x+1 dan).

Neither System A or System B is correct. It’s purely a matter of choice.

1 Like

Think it depends on the implementation of the rating/ranking system. If the system operates directly on rankings (as AGA does) then the log scale would clash with this. If the system operates on ratings which are a log mapping of the rankings (as EGF does) then there is no inherent clash there. I wouldn’t expect OGS rankings to remain in line with AGA rankings for this reason.

I’m not sure what’s your point. If

  • A is stronger than B
  • A is 1 stone stronger than A’
  • B is 1 stone stronger than B’

then the winrate of A against A’ is always higher than the winrate of B against B’. This doesn’t depend on the rating system.

In the ELO system (and so also for Glicko2), the expected win rate is specified by the rating difference. In the AGA system its specified by the ranking difference.

Let’s say the OGS modeling that implies a log scale mapping between Glicko2 and rank is correct (so also a stone of handicap translates to the slightly widening rating bands as ranks go up). Then you should expect the OGS ranks to drift away from the AGA rank bands in places over time. AGA doesn’t expect a 1 stone handicap between ranks (as 1 stone advantage gets slightly more important further up the ranks).

Really? That’s strange.

This seems one of the better descriptions.

I’m thinking of it as Glicko2 scaled down in rating number. But the point is that the win percentage between players is a function of the rating difference.

And yet, in handicap games, AGA considers that 1 handicap stone = 1 rating point difference.

Capture d'écran 2026-05-18 095105

1 Like

It does (and im not totally sure you are contradicting what i summarized anyway) But according to the rank scale on OGS the gap of 1 handicap widens slightly at higher ranks. It’s fixed at RD_j = r_white - r_black - d_j in AGA.

Probably this is not a problem to AGAs system working as handicap games are infrequent at higher ranks.

I mean, the AGA rating system doesn’t look coherent. It says that

  • winning probabilities only depend on rating difference
  • in handicap games, 1 stone = 1 rating point.
    These two facts are contradictory, since

Anyway I don’t care much about the AGA rating system, we are on OGS, and OGS made a different choice, to match 1 rank difference with 1 handicap stone. The EGF made the same choice.

1 Like

I think this and

might point at the confusion.

I think AGA is using the word rating but they mean rank.

The rating system is scaled so that it corresponds to the traditional kyu/dan ranks.
Ratings in the range (3.0, 4.0), for example, correspond to that of a 3-dan, while those in the range (−4.0,−3.0) correspond to that of a 3-kyu.
There are no ratings between -1.0 and 1.0; there is a ratings difference of 0.02 between -1.01 and 1.01.

I mean there is a slight difference. I think they want rank to be something like 3kyu or 4 dan, and rating to be -3.5 and 4.5 etc.

So I think that’s somewhat related to the point being made being an exponential/log mapping between ratings and ranks.

I don’t know the details of the system, just looking at a quick glance at the docs.

The AGA system says that if rating <0 then rank = rating -1 and if rating >0 then rank = rating +1. But otherwise, there is not log or exp relation between ranks and ratings, contrary to EGF or OGS.

There is a way to do this, and that is by adjusting effective ranks based on the games’ starting condition.

The “goratings v6” document I posted earlier contains effective rank adjustment formulas that happen before glicko2.

But not by the same amount as in an even game. It is not treated as an even game. I think that’s the main point your argument misses.

You also seem to imply that white will lose Handicap 1: 0,5 Komi games as much as real even games.

However, with a Handicap 1: 0,5 Komi game, white will still win more than half of the games, both in reality and in OGS-Goratings-glicko expectation.

This is also only true if you assume that “Handicap 1: 0,5 Komi” would be treated as enough to bridge the skill gap between ranks. You seem to be arguing that if black wins a H1 game that glicko rewards it as “black won an even game”, but it doesn’t. The amount of rate change is different in all cases.

To clear possible lost-in-translation, would you be willing to work out a calculated example of two players having:

  • one rank difference in a zero handicap game
  • one rank difference in a “Handicap 1: 0,5 Komi” game
  • no rank difference in a zero handicap game (even game)

And point out which expected win chance and/or rating change is wrong?

Based from text it seems to me that you omit key parts of how OGS does ratings. Now I fully accept that it is me who misunderstands you. Even so, I think this conversation would benefit from moving towards using the math to point where the flaws are.

That would have to be based on the goratings.md document I linked to earlier. The reason for this is that the ratings calculator on the OGS webpage might not be updated to goratings v6, but I haven’t checked that yet.

——

This assumes that there are players who deserve to be higher dans, but aren’t.

If true, not saying it is, you can simply scale the log formula with a factor. For example so that all current 3d become 8d. 25k stay 25k and 10k would shift to around 8k, an overseeable change.

Would that solve the issue of thin dan ranks?

This seemed reasonable, so I’ve worked some numbers up to go through this now.

First to address your final comment, it would be possible to rescale the ratings to create more dan players of course, however… I commented above about the rating width of each rank along the exponential scale (see the rank to rating conversion). Having created more higher dan ranked players the system expects them to maintain a higher win rate at higher rank against 1 rank down opposition. If the players you ranked up were only holding their rank previously, they are now on a slight downward trajectory should they maintain the same win rate. The dan ranks are thinning back out slightly again as the system goes forward.

For the following calculations I’ve applied the ELO system. The difference to Glicko2 vs ELO is that the delta factor is fixed in ELO. This is not material to what’s going on however.

If we take an exactly 10k player (playing Black) facing an exactly 9k player (playing White) as our start point. The 10k has a rating of 1612 and the 9k has a rating of 1664. In an even game the rank gap is 1 and the rating gap is 52. This makes the win rate expected 42.572% for Black and 57.428% for White. With a K factor of 25 Black will gain / White will lose 14.357 points for a Black win or Black will lose / White will gain 10.643 points for a White win.

In a 1 handicap game Black’s rank becomes, 10.23k and White’s rank becomes 9.77k. Black has a rating of 1623.96 and White has a rating of 1652.04. The rank gap is 0.46 and the rating gap is 28.08. This makes the win rate expected 45.97% for Black and 54.03% for White. With the same 25 K factor Black will gain / White will lose 13.508 points for a Black win or Black will lose / White will gain 11.492 points for a White win.

At this point I simulated some scores to get a feeling for it (from a gaussian distribution with mean 6.5 and standard deviation 40). If the score was less than 6.5 then White wins at even, when the score was less than 0.5 then White wins at 1 handicap. Black wins otherwise, so you can randomly generate a row of results, here is 1 sample Even: B,B,W,W,W,B,W,B,W,W,B,W,B,B,W and at 1 handicap the same sample becomes: B,B,W,B,W,B,W,B,W,W,B,B,B,B,W. You can observe there is a difference in two results between these sequences where a W win is flipped to a Black win. As a result of these games Black gains / White loses 7.927 points in even games and Black gains / White loses 23.589 points in handicap games (using K factor 25 above).

On the other hand if both sequences happen to match exactly then Black still gains / White loses 7.927 points in even games and Black loses / White gains 1.412 points in handicap. What this should make clear however is that the difference is related to the rate at which 1 handicap results don’t match even results.

The rate that White gains by playing at handicap, vs even is the difference in the points for a White win in each game. That is 11.492 - 10.643 = 0.849 points. When a result is flipped at handicap however the penalty is the difference in points between White winning at even vs Black winning at handicap. That is 10.643 + 11.492 = 22.135 points. So, in order for handicap play to be fully compensated this can only happen at a rate of 0.849 / 22.135 = 0.0384 or less than 4%. So White can afford less than 1 in 25 game results to be reversed by the lack of komi or they are being over-punished / under rewarded by the rating of handicap games. Note the K factor of ELO updates cancels top and bottom of this ratio so this <4% rate will be similar in Glicko2.

Also note, at higher ranks this handicap rate gets worse and the games seem to get closer in score as well.

1 Like

How did you arrive at those numbers?