OGS has a new Glicko-2 based rating system! [2017]

DyPxLaMo · August 13, 2017, 2:26pm

Is there a table showing the approximate relationship between Glicko-2 ratings and Kyu/Dan ranks?

D. Paul La Montagne

Pempu · August 13, 2017, 2:40pm

It was discussed earlier on this same thread. Here is a link
OGS has a new Glicko-2 based rating system!.

froofy · August 14, 2017, 5:00pm

Is there a specific reason why the new rankings are (in general) inflated compared to the old ones?

I would assume the conversion from Glicko to kyu/dan is straightforward and was determined by what conversion factor would keep the majority of the rankings the same. If so, why does the poll (currently) show:

22% - same ranking as before
16% - lower ranking than before
63% - higher ranking than before

(please excuse the rounding errors if you are a perfectionist)

I understand that your rank can only be used to find similar strength players on the server you are ranked on so in the end it really doesn’t matter . . . but . . .

If you’re already used to your rank and all of a sudden it changes for . . . well I’m not sure why it changed. Anyway, if the changes were distributed more evenly, it would make sense – perhaps the rankings are more accurate that way. As it is currently distributed, I can’t understand it.

Compared to other servers, it seems like OGS has been more conservative in their rankings, and I prefer it what way psychologically – you can feel more confident that you’ll play up to your ranking in other places instead of worrying that your server inflated your ranking just so you can feel good about yourself.

In any case, I’m sure it isn’t as simple as changing from:

New OGS ranking = Glicko number * Unknown Variable

to:

New OGS ranking = Glicko number * Unknown Variable - 2

Is it?

VARoadstter · August 14, 2017, 6:08pm

Well, obviously not. Otherwise all ranks would just be uniformly higher.

The explanation could be as simple as the fact that we changed from one rating system to an entirely new one. A poor analogy would be giving temperature readings in degrees Fahrenheit vs Celcius. Both are reported in “degrees”.

Additionally, this thread has actually gone to great lengths to attempt to clarify how the new rankings are computed. One could argue that the conversion from Glicko rating to Kyu could be adjusted but in the end the old rating can simply be disregarded as it is no longer relevant. Some people obviously have a fondness for the old system as they understand it better - that’s human nature. There’s no right or wrong on that score - everyone feels the way they feel for their own reasons.

In the end, assuming there isn’t a huge “repeal and replace” effort undertaken by the members we will all come to understand the new system to the same extent that we understood the last. The admins feel it’s more accurate and that’s good enough for me. As long as I can have some sense of my opponents strength I’m fine with whatever they do.

BHydden · August 14, 2017, 9:42pm

There is. Due to the inaccuracies of the old system, it was known, as you mentioned, that OGS ranks were deflated by 2 stones compared to most other rating pools. That fact that when we switched to glicko we saw an average shift forward of 2 stones is actually evidence that the new system is much more accurate.

Misawa · August 14, 2017, 11:46pm

Can this excellent chart be posted somewhere easy to get to? I keep digging this out of this very long thread. Since the ratings don’t translate easy, this chart is critical to understanding what is going on.

moocowpong1 · August 15, 2017, 12:22am

I wonder how reasonable it would be for the devs to run through all the data and build some rough correspondences and use them to calibrate a separate rank scale for each rating – e.g., players with X blitz 13x13 rating tend to have about Y overall rating, making them about Z kyu. It wouldn’t have to be exact to be informative, any sort of reference point would be helpful.

BHydden · August 15, 2017, 12:35am

I think all of this discussion of “list the expanded rating chart in k/d” is forgetting that with the old system, every other day we had a forum post asking which rank was used in pairing players and “why we don’t just have one rank”

…and now that we have one rank, suddenly everyone is saying why doesn’t the expanded statistics have a rank attached??

I don’t think we can have it both ways…

aulavik · August 15, 2017, 5:55am

I guess it’s a good thing I am still 20 kyu when this new rating system kicked in. I’ve got nothing to lose. However, it must be tough for players who are in SDK or dan level to see corrections in their ratings and ranks.

Anyways, I still think we should just go play, and give this new rating system a chance.

Kreur · August 15, 2017, 6:11am

I would prefer to keep both rating systems. Glicko-2 as main used for everything like now and elo for purely information value. There are several reasons for it:

“Validation” of glicko-2 ratings - it would be interesting to see how ratings converges, and honestly it would be greater achivement for me to get shodan in both glicko-2 and elo than just in glicko.
Familiarity - almost every go player knows how elo system works, at least on superficial level.
Ease of comparison to other ratings - there was quite accurate rating comparison table , which is now much less reliable (at least until glicko-2 ranks are “validated”), which harms for example ability to set right handicap in RL games.
Accuracy in historical data - one of very few glicko-2 flaws and elo works ok-ish here.
Backup - just for case something with glicko-2 (which have more moving parts) go horribly wrong.

There is no need to keep elo rating chart on profile page, but eg. button with link to it would be really helpful. Afaik elo ratings are quite easy to calculate computing power-wise.

BHydden · August 15, 2017, 7:52am

IMO keeping both would only be a source of confusion. At its core Glicko-2 is essentially elo it just factors in a confidence rating along side it for extra accuracy. The chart you reference was, from my understanding, taken from some sketchy limited data from OGS 2 versions ago and as such was not reliable in the first place. From what I’ve heard so far I believe the new Glicko-2 ratings should more or less line up with AGA and KGS +/- a stone.

Not sure what you mean by accuracy in historical data?

Kreur · August 15, 2017, 10:42am

Elo factors (aside from constants) just opponents current rating and result, glicko have more moving parts its not just elo + confidence interval. Having two rating (even when elo would be “semi-hidden”) is indeed source of confusion, but just glicko-2 UNTIL IT IS “VALIDATED” is even bigger source of confusion imho.
Glicko and historical data problems are disussed in this thread.

Main point of keeping elo is to give credibility to glicko-2. During elo-period I think I won even games max vs 3 stones stronger and lost max vs 3 stones weaker, while during shorter glicko-period I won vs 5 stones stronger and lost vs 4 stones weaker. Might be coincidence, mighte be glicko-2 shenanigans - this is main reason why I think its good to let elo ratings “validate” glicko-2 ratings.

sTan · August 15, 2017, 4:57pm

After a few days:
I still feel less informed about my opponent.

As a normal guy who likes to play go, I start to go frustrated with the new system (not with the rank computing, with the information about it) I kind of hoped there will be some improvements after the discussion here, but there isn’t.

It looks like I have to become a numbers guy to understand and use OGS and its tools probably … or is usability just not so important anymore to OGS?.
(Or is there maybe a further discussion on the other platform, where I have to register additional? )

mlopezviedma · August 15, 2017, 6:26pm

Numbers are not hard to understand, and they are one click away. Click on a username, and you’ll see something like 15.6k ± 1.2 meaning that the system is confident that his/her rank is between 14.4k and 16.8k.

The only thing you need to get used to is that ranks are rounded towards the weaker one, and that weird jump between 0.0k (rounded 1k) and 1.1d (rounded 1d).

Now, going back to all those glicko numbers, there’s nothing more than a direct translation:
Points: < 1000 means beginner, 1000 < r < 1650 means ddk, 1650 < r < 2200 means sdk, > 2200 means dan
Deviation: > 220 means provisional, 220 < d < 100 means somewhat established, < 100 means well established.

Now it’s a matter of mental interpolation to translate those numbers to common language. E.g. 1950 ± 75 means very well established strong sdk.

Does this make sense?

kickaha · August 15, 2017, 7:36pm

Of course it make sense. Unfortunately this argument isnt about logic, its about quality of life.

EDIT: having these benchmarks helps though. thanks.

mlopezviedma · August 15, 2017, 7:39pm

Before (with the Elo system) you had rank and rating. Now you also have that at a first glance, and much more on a user’s profile. How can’t that improve quality of life?

Russjass · August 15, 2017, 7:53pm

What is more frustrating is when you lose to someone “weaker” than you, and under the old rating system they were much stronger than you. I suppose it is just the system correcting itself, and I dont play handicap games, so not the end of the world

mlopezviedma · August 15, 2017, 7:53pm

Still, you can’t validate one system with another system, because there’s no way to prove that Elo is better than Glicko. You’ll never know which system is more accurate in absolute terms.

jgk · August 15, 2017, 8:03pm

It would be great if perhaps the devs could post a summary of the analysis they did on the system. For those who know the internals of Glicko vs Elo, most believe that Glicko is more likely to do a better job of predicting the winner of a match.

Perhaps some people would feel better knowing that the OGS data validates this assumption.

Something like increased prediction rate, or root mean square error or something.

kickaha · August 15, 2017, 8:19pm

agreed. i do like all the additional info that is provided now! i am also totally fine with having a different rating algorythm than ELO, so no issue there.

i just think, that the information was available more readily before the update, from a purely graphic point of view that is. … ill just get used to how it is now i guess, its not such a concern .