OGS has a new Glicko-2 based rating system!

Under the old rating system I’m currently 9k and I’m playing a number of games against opponents who have a rating in region of +/-10k under the new system. Under the new system I’m 14k.

I just finished a game against someone who is 8k who congratulated me on a close game that would have been a 6 stone handicap. I thanked him for his praise but pointed out that under the old system I’m around 10k.

I’ve pointed out on a previous post that my game results show a consistent regular up and down performance oscillating between about 13k to 8k.

You’re currently playing a superhuman 72 consecutive games. That means that if you have a bad couple of days then you can potentially spoil 72 positions:

Suddenly the oscillation makes sense. You probably are closer to 8k - just that you play so many simultaneous games that your bad days (which everyone has) may cause you to drop 5 stones in rank.

You appear to be in one of your 13k phases right now. According to the trend indicated by your graph, you’ll be back up to 8k within the next 8 weeks.


Beginners also need unranked (21k-) option. For them difference between 20k an 24k is huge.
I can create game without restriction but I need to click cancel each time someone else accept. People may remember that I canceled them. And if my color is white cancel button doesn’t work after black places stone and I place none.

The new Glicko rating seems indeed to be inflated compared to the old ELO system by 1-2 stones, at least in the range of 2-6 kyu where I can properly judge, it is really noticeable. And this is based on those players who show pretty constant performance over time so that ELO system is showing the correct rating with confidence.

I support froofy towards having tougher ranking on the server. Like this you really feel that you merit your rank, and you are confident about your level if you go somewhere else. It is always good to know that “our” dans will beat “their” dans.

To BHydden: according to https://senseis.xmp.net/?RankWorldwideComparison, OGS was already on par with European rating, so quite non-clear why one should inflate it, especially that the declared goal is to make it more comparable with EGF.

1 Like

Yes, before we were more inline with EGF… but now we are more closely related with Japan, China, America and KGS… seems like a step forward to me.


I just made a script for converting the Glicko rating table:

But I re-read this about things being on different scales. What does that mean? Does the formula glicko = 850exp(0.032r) use different constants? Are at least all the 19x19 and overall constants the same, and it’s only different for the smaller board sizes? Can you clarify this?

Traditionally the kyu/dan ranks are spaced out such that each rank represents exactly 1 stone of strength in either direction. To get that to work functionally, you need to play around with the constants in the glicko rating algorithm so that it can be as accurate as possible. Because the “breakout table” has multiple different data points, it is impossible to have maximum accuracy for each rating using the same constants. This means that, though you may find it simple enough to assiciate a kyu/dan number to the glicko rating, this number you have come up with is unlikely to accurately represent a 1:1 ratio between stones and ranks.


Hmm - you know, the only way that there can be a 1-1 relationship between stones and ranks is if people regularly play ranked handicapped games, and these feed into the results pool for glicko.

If this isn’t happening (and I don’t see it happening) then there is absolutely no way anyone can say that a difference of 1 in ranking is equivalent to a 1 stone handicap.


I disagree, but as yet have not given it enough thought to back up my feelings mathematically. I am of the opinion that a strong correlation between stones and ranks would be achievable without specifically ranked handicap games being required, due in large part to our understanding of standard deviations on a normal curve.


But how does that help you know what the effect of one stone is?

Surely that can only be measured by players of 1 rank difference playing each other with one stone handicap in place and seeing that 50/50 results are achieved.

If there are no games in the results pool where the effect of one stone handicap is measured, I can’t see how you can deduce anything about that mathematically…

1 Like


Statistics can be abused but they also have a large variety of tremendous uses.

Put simply, if a + b = c and b + c = d, we can infer that a + 2b = d even though we have no data containing both a and d, as we have observed their mutual interaction and effect on b.


But if you don’t have any observations of d at all, then what?

This is what I am saying: given that there are as good as no data points in our results pool for handicapped games, how can you correlate anything?

Or are we relying on correlation with data from other pools where there are handicap results?

OGS used to use the EGF rating formula for estimated win rates across kyu/dan ranks. There’s no reason why it can still use the same win rates with the Glicko rating system. I’m guessing that’s why the new k/d ranks are scaled to rating using an exponential model.

Though, if Glicko win expectancies across a constant rating delta is the same as pure ELO, I’m expecting the ranks are smaller than the EGF spec (ELO is already 100 points/rank at EGF19 kyu). That might be why the ~20 kyus are seeing some rank deflation, while higher ranks are seeing larger rank inflation.

EDIT: Though my attempt to derrive a equation from the EGF spec, 14000 - 80 ln | x - 20 | * 100 (~1200 at 20 kyu) seems to be way too wide (~2x rank size compared to the current OGS formula)


Why then this game doesn’t seem to be recorded in my win/lose graph?

I agree that this looks “wrong”.

There is a game showing in the history, on a day where there are no other games, and there is no corresponding point in the ranking graph.

So, what am I supposed to do in such a case?

First thing is put this report into a separate topic in the Feedback->Bug Report category of the forum.

Honestly, if there is a problem with just one game, I would just let it go…

I’d rather see it reported - it indicates there’s a bug, so there’s no knowing how many games are being affected that we don’t notice because we’re not really looking.

1 Like

If every correspondence player has a problem with “just one game”

…that’s a lot of games.