OGS has a new Glicko-2 based rating system! [2017]

Eugene · August 21, 2017, 9:58pm

No, as you can probably see on your profile, all your history remains and your rank is calculated based on all those games you’ve played.

AhmedA · August 22, 2017, 2:04am

Actually I’m kinda a beginner who was familiar with the old system and got nearly 21K. After that I suddenly become 25K and stick with it without changings even after winnings (as I first was when I signed up).

Does this mean that my ranking has something wrong or will it be updated when the new system implementation is completed?

Thanks for reading and I hope to get answer soon

Eugene · August 22, 2017, 3:12am

I think you are confused, as I was, by the fact that the new graph only shows one point per day.

So on the 22nd you won one game but lost two. The overall effect was downwards, which is what your graph for 22nd shows.

Personally, I think the new graph should work like the old one: one point per game, not one point per day.

Eugene · August 22, 2017, 3:14am

I couldn’t find where the problem is.

If we look on the 22nd, you played 2 ranked games. You lost one to an 8k, which takes you down a tiny bit, and you won one against a 25k which takes you up. Overall the graph shows you went up on the 22nd, so that seems correct.

As with the other poster: note that the graph only shows one point per day, not one point per game. It unfortunately means that you can’t see the effect of each game on your rank, unless you play less than one game per day.

orbitaleccentric · August 22, 2017, 3:51pm

Under the old rating system I’m currently 9k and I’m playing a number of games against opponents who have a rating in region of +/-10k under the new system. Under the new system I’m 14k.

I just finished a game against someone who is 8k who congratulated me on a close game that would have been a 6 stone handicap. I thanked him for his praise but pointed out that under the old system I’m around 10k.

I’ve pointed out on a previous post that my game results show a consistent regular up and down performance oscillating between about 13k to 8k.

Farraway · August 23, 2017, 7:51am

You’re currently playing a superhuman 72 consecutive games. That means that if you have a bad couple of days then you can potentially spoil 72 positions:

Suddenly the oscillation makes sense. You probably are closer to 8k - just that you play so many simultaneous games that your bad days (which everyone has) may cause you to drop 5 stones in rank.

You appear to be in one of your 13k phases right now. According to the trend indicated by your graph, you’ll be back up to 8k within the next 8 weeks.

stone.defender · August 23, 2017, 8:48pm

Beginners also need unranked (21k-) option. For them difference between 20k an 24k is huge.
I can create game without restriction but I need to click cancel each time someone else accept. People may remember that I canceled them. And if my color is white cancel button doesn’t work after black places stone and I place none.

SilentPlayer · August 24, 2017, 11:15pm

The new Glicko rating seems indeed to be inflated compared to the old ELO system by 1-2 stones, at least in the range of 2-6 kyu where I can properly judge, it is really noticeable. And this is based on those players who show pretty constant performance over time so that ELO system is showing the correct rating with confidence.

I support froofy towards having tougher ranking on the server. Like this you really feel that you merit your rank, and you are confident about your level if you go somewhere else. It is always good to know that “our” dans will beat “their” dans.

To BHydden: according to https://senseis.xmp.net/?RankWorldwideComparison, OGS was already on par with European rating, so quite non-clear why one should inflate it, especially that the declared goal is to make it more comparable with EGF.

BHydden · August 25, 2017, 3:32am

Yes, before we were more inline with EGF… but now we are more closely related with Japan, China, America and KGS… seems like a step forward to me.

KillerDucky · August 25, 2017, 11:04pm

I just made a script for converting the Glicko rating table:

But I re-read this about things being on different scales. What does that mean? Does the formula glicko = 850exp(0.032r) use different constants? Are at least all the 19x19 and overall constants the same, and it’s only different for the smaller board sizes? Can you clarify this?

BHydden · August 26, 2017, 2:02am

Traditionally the kyu/dan ranks are spaced out such that each rank represents exactly 1 stone of strength in either direction. To get that to work functionally, you need to play around with the constants in the glicko rating algorithm so that it can be as accurate as possible. Because the “breakout table” has multiple different data points, it is impossible to have maximum accuracy for each rating using the same constants. This means that, though you may find it simple enough to assiciate a kyu/dan number to the glicko rating, this number you have come up with is unlikely to accurately represent a 1:1 ratio between stones and ranks.

Eugene · August 26, 2017, 2:05am

Hmm - you know, the only way that there can be a 1-1 relationship between stones and ranks is if people regularly play ranked handicapped games, and these feed into the results pool for glicko.

If this isn’t happening (and I don’t see it happening) then there is absolutely no way anyone can say that a difference of 1 in ranking is equivalent to a 1 stone handicap.

BHydden · August 26, 2017, 2:16am

I disagree, but as yet have not given it enough thought to back up my feelings mathematically. I am of the opinion that a strong correlation between stones and ranks would be achievable without specifically ranked handicap games being required, due in large part to our understanding of standard deviations on a normal curve.

Eugene · August 26, 2017, 2:19am

But how does that help you know what the effect of one stone is?

Surely that can only be measured by players of 1 rank difference playing each other with one stone handicap in place and seeing that 50/50 results are achieved.

If there are no games in the results pool where the effect of one stone handicap is measured, I can’t see how you can deduce anything about that mathematically…

BHydden · August 26, 2017, 2:21am

Correlation.

Statistics can be abused but they also have a large variety of tremendous uses.

Put simply, if a + b = c and b + c = d, we can infer that a + 2b = d even though we have no data containing both a and d, as we have observed their mutual interaction and effect on b.

Eugene · August 26, 2017, 2:47am

But if you don’t have any observations of d at all, then what?

This is what I am saying: given that there are as good as no data points in our results pool for handicapped games, how can you correlate anything?

Or are we relying on correlation with data from other pools where there are handicap results?

timuzhti · August 28, 2017, 1:53am

OGS used to use the EGF rating formula for estimated win rates across kyu/dan ranks. There’s no reason why it can still use the same win rates with the Glicko rating system. I’m guessing that’s why the new k/d ranks are scaled to rating using an exponential model.

Though, if Glicko win expectancies across a constant rating delta is the same as pure ELO, I’m expecting the ranks are smaller than the EGF spec (ELO is already 100 points/rank at EGF19 kyu). That might be why the ~20 kyus are seeing some rank deflation, while higher ranks are seeing larger rank inflation.

EDIT: Though my attempt to derrive a equation from the EGF spec, 14000 - 80 ln | x - 20 | * 100 (~1200 at 20 kyu) seems to be way too wide (~2x rank size compared to the current OGS formula)

rantash · November 21, 2017, 10:48am

Why then this game doesn’t seem to be recorded in my win/lose graph?

Eugene · November 21, 2017, 1:10pm

I agree that this looks “wrong”.

There is a game showing in the history, on a day where there are no other games, and there is no corresponding point in the ranking graph.

rantash · November 21, 2017, 8:06pm

So, what am I supposed to do in such a case?