There has been some mention in various topics recently that an OGS rating system update is imminent. Can we have some information about what will be changed and what effects this is likely to produce please?
FWIW - I like the OGS site. It is one of only 2 servers that I play on regularly, the other being DGS. I especially like OGS’s live games feature, the joseki dictionary, AI review and the aesthetically pleasing look and feel of the site. Great work @anoek - keep it up dude! However (and I don’t mean to be harsh here), the OGS rating system is seriously dodgy! So much so that I basically ignore my OGS rank now and just use my DGS rank.
The main problem is that OGS ranks fluctuate way too much.
And the whole idea of retroactively updating ratings for previously completed games based on how the opponents’ ratings have changed since seems to me inherently flawed; the result then was based on the two players’ strength then, which does not necessarily correlate with their strength now. I mean, most players will increase in strength with time, some will stay the same strength if they don’t work so hard at improving or are stuck in a rut or something, but very few players will decrease in strength over time unless there are exceptional circumstances. So a rating with a gradually increasing overall trend and with some small (less than one stone) variability about that trend is probably quite an accurate representation of reality, rather than a rating that fluctuates wildly.
For example, a friend of mine who is a fairly strong SDK player plays a lot of games on OGS and they have an OGS rank that fluctuates anywhere between about 8k and 2k. Are you seriously telling me that their strength can vary by 6 stones over a time-span of weeks? My OGS rank varies less because I play fewer games on OGS but is currently around 10-8k. So, depending on which day you look, my friend an I might have the same OGS rank (8k). I can say with certainty, there is no way my friend and I are the same strength, as shown by our many over-the-board (OTB) games where they kick my butt with a 4-stone handicap! However, if you look at the mid-point of our fluctuating OGS ranks (9k vs 5k) then you get a better representation of our strength difference (4 stones as per our OTB handicap). I also face a similar situation with another friend who is currently a lower rank than me on OGS but they give me a 3-stone handicap in OTB games. So maybe OGS would be better off adopting a rating system that does not fluctuate so wildly and somehow captures more of an average of the current highly-variable rating.
So, maybe GLICKO-2 worked OK in tests but in practice it is somehow numerically unstable? And/or maybe it’s just over-complicating things and a simpler (ELO?) system might work better? It might be worth looking at how DGS do their ratings (it’s all open source so you can just look - I think it’s ELO based) as their rating system seems pretty stable and reliable - I rarely, if ever, play an opponent on DGS whose rank seems wrong and I feel like my own rank changes on DGS quite accurately reflect my increase in playing strength. But on OGS, I have no idea because everyone’s rank is different from one day to the next (and there’s no way anyone can increase / decrease by more than a stone in strength overnight!).
Thanks. And I say again, great site overall