Ratings Update January 2021

dragon-devourer · January 19, 2021, 12:33pm

There has been some mention in various topics recently that an OGS rating system update is imminent. Can we have some information about what will be changed and what effects this is likely to produce please?

FWIW - I like the OGS site. It is one of only 2 servers that I play on regularly, the other being DGS. I especially like OGS’s live games feature, the joseki dictionary, AI review and the aesthetically pleasing look and feel of the site. Great work @anoek - keep it up dude! However (and I don’t mean to be harsh here), the OGS rating system is seriously dodgy! So much so that I basically ignore my OGS rank now and just use my DGS rank.

The main problem is that OGS ranks fluctuate way too much.

And the whole idea of retroactively updating ratings for previously completed games based on how the opponents’ ratings have changed since seems to me inherently flawed; the result then was based on the two players’ strength then, which does not necessarily correlate with their strength now. I mean, most players will increase in strength with time, some will stay the same strength if they don’t work so hard at improving or are stuck in a rut or something, but very few players will decrease in strength over time unless there are exceptional circumstances. So a rating with a gradually increasing overall trend and with some small (less than one stone) variability about that trend is probably quite an accurate representation of reality, rather than a rating that fluctuates wildly.

For example, a friend of mine who is a fairly strong SDK player plays a lot of games on OGS and they have an OGS rank that fluctuates anywhere between about 8k and 2k. Are you seriously telling me that their strength can vary by 6 stones over a time-span of weeks? My OGS rank varies less because I play fewer games on OGS but is currently around 10-8k. So, depending on which day you look, my friend an I might have the same OGS rank (8k). I can say with certainty, there is no way my friend and I are the same strength, as shown by our many over-the-board (OTB) games where they kick my butt with a 4-stone handicap! However, if you look at the mid-point of our fluctuating OGS ranks (9k vs 5k) then you get a better representation of our strength difference (4 stones as per our OTB handicap). I also face a similar situation with another friend who is currently a lower rank than me on OGS but they give me a 3-stone handicap in OTB games. So maybe OGS would be better off adopting a rating system that does not fluctuate so wildly and somehow captures more of an average of the current highly-variable rating.

So, maybe GLICKO-2 worked OK in tests but in practice it is somehow numerically unstable? And/or maybe it’s just over-complicating things and a simpler (ELO?) system might work better? It might be worth looking at how DGS do their ratings (it’s all open source so you can just look - I think it’s ELO based) as their rating system seems pretty stable and reliable - I rarely, if ever, play an opponent on DGS whose rank seems wrong and I feel like my own rank changes on DGS quite accurately reflect my increase in playing strength. But on OGS, I have no idea because everyone’s rank is different from one day to the next (and there’s no way anyone can increase / decrease by more than a stone in strength overnight!).

Thanks. And I say again, great site overall

shinuito · January 19, 2021, 1:25pm

I also am not really a fan of how the ratings work at the moment, and I think there is a lot of confusion in the community about it, why ratings go down with a win being a regularly asked question.

That said, just to comment on a few things

I think you’re focusing too much on a case where both players have a stable rating/rank. Lots of players improve rapidly, and they can play much more consecutive games online than they can over the board to facilitate this.

Not only that, it wouldn’t surprise me if for instance one player could have three different ranks depending on whether they play blitz, live or correspondence games. I expect I am much weaker (many stones) at blitz than live or correspondence, in part because I either time out (10s default byo-Yomi is a bit too fast for me) or have to play moves I can’t even read one or two responses to. OGS merges all those ratings into one overall rating (I know it has those separately as well) and I don’t think it’s surprising if that would give big rating fluctuations.

I think there was some issues with a volatility parameter and how handicap games were treated in the past. It was also suggested still some issue to do with handicap games that I hope was reported

One could argue to split off the other side boards and other time settings from the overall rating, which was discussed here with data

benjito · January 19, 2021, 6:00pm

If volatility is the issue, I don’t feel the Glicko-2 system would be at fault so much as the chosen “rating period”. If the rating period were increased from 15 to 30, then the ranks would naturally be more stable.

But keep in mind that this comes at the cost of having less accurate ranks for people that are actually improving or getting worse.

As far as the anecdote about your friend, I do feel that +/-3 stones is a bit much, even for OGS. I’ve played many games on OGS and really haven’t fluctuated outside of a range of 10-6k. I’m curious if there could be other factors at play? Do they vary between live/blitz and correspondence? I find that if I play a lot of live, my rank tends to go up, and when I play more correspondence, my rank goes down.

BHydden · January 19, 2021, 6:52pm

Nobody knows the exact particulars other than anoek, and he will cover it all when he announces the change officially. He is aware of the general concerns and issues with the current implementation (hence the fix).

Gia · January 19, 2021, 6:54pm

Oh, I thought the fix was a 2020 exorcism, with sage and all.

benjito · January 19, 2021, 8:12pm

But one can track the progress!

dragon-devourer · January 19, 2021, 9:48pm

Thanks for the GitHub link @benjito - that’s exactly what I was looking for. As for my friend, I don’t know exactly how they split their games between blitz, live and correspondence but I know they at least do some live and some correspondence as we do both of those together and I know they do both with our other friends.

dragon-devourer · January 20, 2021, 10:47am

Haha! Yeah, like if you become the best player on OGS, ranks don’t matter at all - just beat everyone

shinuito · January 20, 2021, 11:00am

There’s rumours about something happening soon. Not sure how much, but something.

dragon-devourer · January 20, 2021, 11:38am

Just to illustrate how it might be better to have something simple rather than complicated…

DGS takes into account both players’ ranks at the start of the game and at the end of the game (along with handicap, komi, who plays as which color, etc.) to work out who should win (start and end of the game considered to allow for rating changes due to other simultaneous games as DGS is all correspondence). If the result is in line with expectation, the rating change for both players is smaller than if the result is against expectation. Obviously, the winner gets an increase, loser gets a decrease; there is never the case where it’s the other way round. Larger strength differences mean a larger rating change for results against expectations, or a smaller rating change for results in line with expectations (potentially zero change if the game was a foregone conclusion, e.g. 1 dan beats 20 kyu in an even game, probably means zero rating change for both). There is no messing about on DGS with later changes in strength altering the rating change from earlier games - the DGS rating change is based on the players’ strengths at the time of the game only. Simple! And it works!

On DGS, the rating change is also proportional to board size (smaller board = smaller rating change). This also seems to make sense and works in practice.

Might be worth OGS considering the use of some of these ideas.

Animiral · January 20, 2021, 11:59am

Remember when the rank graph was 4 lines and you could just pick whichever one was the most believable?

Good times.

flovo · January 20, 2021, 5:14pm

Rating systems are usually compared by the quality of their prediction e.g. which predicts the outcome (win/loss) of a game correct more often.

Some good rating systems (like WHR) might even change your rating without the need of you to even play a game. Such rating systems can be better since they can use more information to estimate a player’s strength compared to the other players in the pool.

richyfourtytwo · January 20, 2021, 5:26pm

I held this purists view for some time. But after seeing so many complaints here, I now think that acceptance is also an important criterion. If many people reject it (even if that’s for reasons I don’t agree with at all), it might be better to change it, even if the outcome is a system with less predictive power.

triangle_fuseki · January 20, 2021, 5:28pm

but that’s not the only thing that OGS users need
rating system is also way of punishment for lose and reward for win

most people doesn’t like when rating goes down after win
rating can’t be 100% precise by its nature, I play much worse when I wish to sleep
Currently I have ±1.1 after rank. If I will have ±1.7 I would not care, but I would be much more happy if rank not increases after lose.

benjito · January 20, 2021, 6:02pm

I imagine there is a system that does both.

It’s not so hard to imagine a system with high predictive power and rank that doesn’t go the opposite direction of your win/loss

BHydden · January 20, 2021, 6:03pm

Ideally, ours in a week or so

dragon-devourer · January 20, 2021, 7:10pm

You don’t have to imagine - it exists! See my earlier posts Re dragongoserver.net

dragon-devourer · January 20, 2021, 7:26pm

Exactly!

Plus, rank / rating also provides a measure of progress for those working on improving their skills.

I’m not too bothered if the DGS system is slightly less good at predicting the outcome of a game (and it will be only slightly as it’s pretty good). The point is that the DGS system is good enough that I know an even game against someone within a stone or two should be a good match, larger differences are going to be pretty difficult / easy. Conventional handicap (1 stone per rank difference) should even things up. Job done!

Sadly the OGS system currently fails on all of the above Hopefully the update will address these issues.

Groin · January 20, 2021, 7:57pm

But DGS is only for correspondence games, the task of OGS is a bit harder

teapoweredrobot · January 20, 2021, 8:57pm

I’m pained by this idea. I hope it’s not only me that does not see this as anything to do with the ranking system. Surely ranking systems are entirely about predicting outcomes and hence about finding fair match ups.
A reward/punishment system is a different thing all together. I’ve not used it much but this is what go quest has right? A rank of some sort and points not particularly limited to the rank.