Unstable ranks?

buntspecht · November 27, 2017, 7:19pm

I am a new player here. Before I played on KGS, IGS and DGS. Although I like the design and features of OGS there is one thing which really disturbs me. Why is the ranking system of OGS producing these highly unstable ranks? I mean, every player profile I look up shows a volatile ranking graph. E.g. even players with a lot of played games seem to go up to 5dan and go down to 1kyu in a matter of weeks before they go up again. That’s weird and differs greatly from my experience with other servers.

And it seems that people don’t like handicap games here, too. How on earth is it possible to play balanced games without handicaps and these unstable ranks? Am I missing something? Just asking

Mogadeet · November 27, 2017, 8:06pm

Using the “Find a Game” feature I get paired with someone within 3 levels of my own rank, so all my opponents are relatively close. All the games I’ve played using that feature have been extremely competitive and no one has swept me off the board like an annoying fly, nor have I blown through an opponent like France of 1940. Every game, whether I win or lose, has been a tough, and well-matched fight.
The graph, and the movable rank, seems to me like a good indication of one’s play. It gives an opportunity for a player to gauge themselves and get instant feedback, and feedback over time, to either improve or stagnate, as the player desires.

The graph seems to illustrate that the player’s rating is a moving target (which it is) and gives the player valuable information about their game. I would look at it like a production graph from a factory…a good indication of the state of the “business.”

But like all human endeavors, it will be volatile. It is our nature to move forwards and backwards as we ourselves change, and our opponents change. And the more we mix it up (i.e., the greater the number of inputs) the more changes will be represented.

Mogadeet

Farraway · November 27, 2017, 8:27pm

I agree. I’d prefer a culture where handicap games are more common and ranks were more stable. Handicap games have a wonderful side-effect of ensuring that 1 kyu approximates one handicap stone in strength. I have also found handicap go (against either stronger or weaker players) incredible instructive.

But that is not what OGS is. Here we have ratings and ranks are mapped onto them. The rank is therefore somewhat accidental. Furthermore, handicap games are rare - so each kyu of rank may not map neatly to 1 handicap stone.

The rating, however, does seem to be an accurate measure of some sort. One 10 kyu player does seem to be similar in skill to another. So something still works. It’s just different to the way that other servers operate.

SanDiego · November 27, 2017, 8:35pm

It disturbs me too, and seems to be tied to the new ranking system deployed earlier this year. I agree with you that the rank of a dan player with lots of games should not fluctuate that much, and that’s the case on other servers.

I would argue that handicap games are not balanced games
That surprised me too at first. Now I mainly play in tournaments, where people are paired by strength.

Great point.

BHydden · November 27, 2017, 9:19pm

I think your two points are connected. Because we don’t play much Handicap go on this server, when two people who are a few ranks apart play, the result has a greater affect on their rating. Thus we see bigger swings.

Mogadeet · November 27, 2017, 9:57pm

I have played games where I’ve been given a handicap (my settings have it as “optional” I believe). Not that’s it’s done me much good.

Mogadeet

meili_yinhua · November 28, 2017, 3:05am

It makes sense that the ratings graph has a certain volatility where it goes up and down, and this is a common occurrence among pretty much all rating systems. This is why statisticians when creating their ratings models will say that a person’s rating follows a brownian motion pattern (which is essentially random-step motion).

Glicko 2 (the server’s current rating system for those who don’t know) actually has variables for both Ratings Deviation (uncertainty) and volatility, and the higher these are, the less ratings will change.

But I do see how handicap could affect this. Because of it, the games are further apart in skill, and (as BHydden noted) result in bigger swings, but I have another couple of theories.

It could be the result of newcomers:
I find this one to be somewhat likely – new players come in at ~13kyu, and if they’re completely new, they get thrashed and sent to the bottom, resulting in a ratings spike for the people above them, whereas if they’re experienced, they thrash the 13 kyus and result in a ratings dip for the people they play against (which can be minimized if we use a system like WHR or if we have people suggest their start rating, but that’s a topic for another day).
This is just people dealing with their work cycles.
Most of the people here aren’t professionals or children, so they have to deal with their work routine (or university or school), which means that there are certain periods where they are more busy and play more casually, thinking a bit less in the process. Considering this is also true for the other servers, this one is much less likely.

But I was actually considering starting to move towards handicap games so as to try to match my IGS experience (which I am enjoying a bit more right now) and would certainly be supportive of a general community move towards them.

Farraway · November 28, 2017, 11:38am

I recently did the same. I just started creating challenges with no rank restriction but with handicap set to auto. So far so good

Nathan · November 28, 2017, 7:18pm

This may be off topic but I have a question about the ranks. I frequently set up matches with handicap and restricted ranks to reasonable handicaps. Are the automatch handicaps based on overall ranks, or time and board size sensitive ranks?

I just matched with someone who had a similar rank overall, but a much weaker rank in the 19x19 time specific rank. Shouldn’t the pairing based on the rank in the specific game type. Overall I feel the ranking does a reasonable job, it just is a little confusing to understand what is actually going on. If anyone could provide a link to an explanation that would be helpful.

Thanks,
Nathan

pie.or.paj · November 28, 2017, 7:46pm

With these unstable ranks handicap games are impossible. Someone 3 ranks above or below me may very well be equal to my strength, adding handicap stones to that would ruin the game.

richyfourtytwo · November 28, 2017, 7:52pm

Hmmm, but someone 3 ranks above you might equally likely be actually 6 ranks better than you, so a 3 stone handicap may save a game that otherwise would have been ruined?

Even if fluctuating the ranks should have a meaning on average. (Or is anyone claiming even that isn’t the case?) I fail to see how such fluctuations would harm handicap games more than even games.

pie.or.paj · November 28, 2017, 7:58pm

They may be but I would say it’s less bad to have an even game ruined by uneven ranks than have what would have been a nice game ruined by invalid handicap.
I’m totally pro handicap if the ranks where more trustworthy.

To me it feels mostly random. I have recently both climbed end felt two ranks on a single day so I have a hard time putting a lot of trust in the ranking system below say steps of 5 kyu.

BHydden · November 28, 2017, 11:40pm

The official statement that EVERYTHING is determined by a players overall rating, the others are simply informational to observe where their strengths may lie.

My understanding is that these ranks fluctuate, in part, because we don’t play much handicap go. I believe a large shift in mentality of players on this server over to handicap games would at first be chaotic but would quickly reorganise players’ ranks into a more stable configuration.

Maharani · November 29, 2017, 9:15am

I still doubt this is true. Still hoping for an official response in this thread: Which rank decides on automatic handicap/komi?

Lys · November 29, 2017, 11:43am

I’m a newbie, so I know very little about rating, ranking and handicap.
So I have a question: do the rating system know if a game is with handicap? Does it have a specific formula for handicap games?
If not, I would immagine that handicap games are more likely ruining the ranking instead of improving it.

StephenC20XX · November 29, 2017, 11:50am

A handicap game (with the correct amount of handicap as determined by the players’ ranks) is rated as if it was a game between players off the same rank. So when you have lots of handicap games happening you can be more sure that 1kyu = 1 stone.

Conrad_Melville · November 30, 2017, 4:40am

I agree with your theory #1: that the main reason is new players entering at 13 kyu, which I think is a bad idea. Another possible factor may be how frequently the ratings are batched. I may have misunderstood the explanation, but I thought the volatility reflects greater accuracy because the ranking is more up-to-date. In any case, the accompanying standard deviation (if it is an SD) gives some sense of the rank’s accuracy.

meili_yinhua · November 30, 2017, 5:22am

It was my understanding that higher volatility reflected greater inconsistency in play, thus making it harder to rate, reflecting lower confidence in accuracy.

It also has the effect of slowing (or even reversing) the decrease of RD (which is indeed intended as an SD) each game and increasing the rise of RD when not playing. This is so that people who are inconsistent or play fewer games don’t affect other people’s ratings so much.

BHydden · November 30, 2017, 8:21am

Think of the whole system as a zero-sum. 1500 is considered “average” and for our purposes, is essentially “0” if you lose games you go down if you win you go up, but always the system stays balanced at a total of 0 (or in other words, if you take the sum of the rating points of every player and divide by the total number of players, you’ll always get 1500).

meili_yinhua · December 1, 2017, 1:45am

Isn’t one of the notable qualities of Glicko and Glicko-2 (of which is not necessarily a pro or con) that it is not necessarily zero sum?
(like, isn’t the only way to get a zero-sum match to have two people with the same RD and volatility to play?)

But yeah, 1500 is supposed to be the expected average, and that’s why the statistical models all seed at 1500.

However, this simultaneously hurts the experience new players and more experienced players (as well as the people around 13k who have to face them), and it would arguably be more efficient to have people estimate their skill like we used to.

The problem with that is that of sandbaggers, although sandbaggers are less of a problem then putting new players in the thick of everything, because then it’s like EVERYBODY is sandbagging them, and now I need to give new players I recommend OGS to a warning that they’re going to lose ~7 games before they are playing players of equal rating.
(I may have said this was a discussion for another day, but I said that three days ago – it is another day now)