Unstable ranks?

BHydden · December 1, 2017, 4:11am

I could be wrong about it being zero sum, that was just my impression.

In my experience though, it’s more like 2-3 games before you’re more or less at the right level.

Animiral · December 2, 2017, 1:06pm

You all talk about the average rating without mentioning the confidence range of +=350 points.
In other words, the system is really uncertain about the player’s rank when they first join. These players are not 13k, they are of unknown rank. Hence, the [?].

Now, it’s true that some newcomer might lose 7 games before their rank adjusts to their level. But the same thing will happen to any new Go player before they get used to it. This is why many of them get into the habit of playing bots instead, and now we’re talking of a different and much larger problem.

Even if we put the starting rank at 700±350, tomorrow some person just like you will find that it is too harsh and demand we start them at 25k instead. Then 30k, 50k and 70k.
Worse, when a new dan player joins the server, they will have to sandbag a long way up. They’ll quit our server from boredom before they get there.

meili_yinhua · December 2, 2017, 6:56pm

I’m not saying all people will need to start lower: that doesn’t fix anything. That just makes the player average slowly move to whatever that starting rank, recreating the problem at a different number.

No, I’m saying we should go back to the old system where we estimate our rank (naturally with a bottom limit) and possibly lower the RD for it.

Eugene · December 2, 2017, 11:43pm

Is there any objective evidence our ranks are more unstable now?

My recollection of the old rank graphs is that the typical one looked even more sawtooth than the current ones…

Given that our current rank tells us specifically that there is an uncertainty in the rank - which is obvious when you think about it - then surely we would expect the samples to fluctuate up and down.

I do actually wonder whether the representation we have is correct.

For example, I show “13k +/-2k”.

This would seem to imply that the system thinks I could conceivably be as low as 15k and as high as 11k.

With no uncertainty I can tell you I am nothing like 11k. I wonder if in fact what this really is trying to say is "the current datapoint is 13k but there is a 2k uncertainty.

That would make alot more sense: I personally think that I am somewhere between 15k and 13k, and that’s what my graph would indicate too.

Anyhow, back to the original point: I don’t remember ranks being any more stable in the old system, and if they did appear that way, probably it was an illusion: the system smoothing out the actual uncertainty because it didn’t measure it.

Really, rank is a very approximate thing: it’s just a guess about who you might be likely to have an even game with.

Mogadeet · December 3, 2017, 12:06am

I like the way the system works now and would hate to see it messed with. I have seen players appear as “unranked,” so a player who doesn’t like the ranking system could go that way, and just ignore the ranking system,

Mogadeet

Farraway · December 3, 2017, 9:58pm

I’ve found two things:

There’s a Handicap Games group which I’ve joined. I may try starting a tournament there at the weekend.
You can require handicaps in your automatch settings:

Not sure what to expect, will see how it goes.

meili_yinhua · December 4, 2017, 1:42am

Personally I don’t think I want to require them, but I have started preferring them, but I don’t get many games in handicap like that…

Iideci · December 4, 2017, 11:35pm

E.g. even players with a lot of played games seem to go up to 5dan and go down to 1kyu in a matter of weeks before they go up again.

I’d like some examples, especially now that the new system has been in use for a while and things should have started to settle. From what I see around 5k my rank is very stable even though I mostly play blitz (I’ve been between 5-4k in the last couple months), and the opponents I face feel very stable (within +/-1 rank). I don’t know if dan ranks are more volatile due to OGS having few high dans in general, but at least I don’t see any major fluctuations.

I know on KGS ranks move extremely slowly, which may cause feeling of stability, but often people can get stuck many stones above/below their real rank if they play a lot and improve quickly. I don’t see value in making ratings have high inertia just for the sake of making it look more stable. Personally I have some experience playing on IGS and feel their mid-SDK ranks are clearly less indicative of strength than on OGS.

As for handicaps, handicap go is a completely different variant of go, similarly to how 13x13 is different from 19x19, and I think it’s not very good way to even the game. When the rating system consists solely of even go, players are rated by their relative proficiency at even go. When you mix in handicaps of varying sizes things get messy. Handicap go tests black’s ability to use thickness to attack and white’s ability to invade, reduce and settle efficiently. Being better (e.g. relatively good at middle game fighting) or worse (relatively good at opening) at these skills at varying handicap levels allows players to inadvertly win or lose against opponents who, judging by their even rank, they shouldn’t. Shoehorning these two variants in to same rating obviously amplifies rank fluctuations, although it’s likely more pronounced in the kyu ranks where skill gaps between different areas of the game are more pronounced. That said, I like handicaps when used in casual games to make the game interesting for players with high skill difference, but as a measure of general skill it’s less than ideal.

gamesorry · December 5, 2017, 2:04am

Maybe not within weeks but here is a typical example, whose peak rank was 4.9d±1.0 on Aug 24, 2016 and current rank is 0.5k±1.1 without losing many games by timeout. I (currently 5.1d±0.9) have only won 2 out of 7 games against him/her.

Iideci · December 5, 2017, 2:25am

Thanks for the example. His rank does seem relatively stable after Glicko rating system change though, staying around 1d~1k with small deviation of +/-80 rating, similar to what I’m seeing at 5k with +/-80 deviation. So it would seem the new system has homed in on his actual rating with very little instability.

Although it’s odd that he dropped from 4d to 1k the change did span a whole year and wasn’t really unstable, but a clear trend, which again is different from OP’s claim of “up – and down – in matter of weeks”.

I wonder if there are any examples of unstable ranks under the new Glicko system.

gamesorry · December 5, 2017, 4:28am

AFAIK the Glicko rating was recalculated for the whole history so it doesn’t matter when Glicko was introduced. (For example, I remember my highest rank before the change was ~3.8d, but now the graph shows I reached 6d once around 2013)

I myself, who only play correspondence games, went from 5.3d (Oct 9) to 6.9d (Nov 7) and back to 5.0d (Nov 22). So fluctuation of 2 ranks seems to be quite normal within weeks.

That said, I personally am not frustrated about the Glicko system or current level of rank instability

Eugene · December 5, 2017, 9:40am

I think gamesorry’s record is very interesting for analysis of Glicko.

The interesting thing is that there are a large number of samples, and Glicko think’s that it’s measure of gamesorry’s rank is quite accurate (+/- 1.0 near enough) yet gamesorry’s rank varies +/- 2 ranks.

Its as if rank is even more fluid than Glicko thinks: one could wonder why Glicko thinks it’s +/- 1 when really it’s clearly +/- 2 at least.

And for me it underpins the idea that rank is very approximate, and any good system trying to measure it will fluctuate, because of that.

Virtuitous · December 5, 2017, 10:10am

There are several adjustable elements in Glicko2, here’s how it’s implemented in ogs;

System settings;
Initial Volatility     = 0.06     (Confirmed)
Volatility Constraints = 1.2      (Confirmed)
Convergence Tolerance  = 0.000001 (Assumed)
Volatility Algorithm   = Unknown

Recommended settings;
Initial Volatility     = 0.06 
Volatility Constraints = 0.2 - 1.2 
Convergence Tolerance  = 0.000001 
Volatility Algorithm   = Illinois Algorithm

http://www.glicko.net/glicko/glicko2.pdf

buntspecht · December 5, 2017, 10:20am

Thanks for all the replies and thoughts!

From the current leaderboards, just the ones I clicked:

As a new user here I didn’t know, that the ranking system changed. :-/ So maybe the graphs I looked at, were “old data”. Is it correct that the new system came in place in August this year?

The graph seems indeed more stable since then. Good news

Virtuitous · December 6, 2017, 11:14am

my thoughts;
the ranks are more volatile, but more easily manipulated,
esp considering bot ranking was activated same time as Glicko2 was implemented.

conclusion;
It’s an everybody’s happy scenario, unless you’re on the upper dan level of the scale.

nb;
we should ask the high dan players’ opinion.

Wulfenia · December 10, 2017, 1:58pm

For me personally, the Glicko ranks are a very bad experience.

My retroactively calculated rank for the past 18 or so months makes little sense and oscillates over a huge range of ranks (which was not the case in the old system), it is impossible for me to predict how a game will impact my rank and how good my opponent is, I feel that I cannot track my progress anymore as I have done for the past years I have learned and played go on this site and all this makes me less inclined to play at all.

Eugene · December 10, 2017, 9:03pm

Are you Wulfenia.6k?

Your graph seems consistent with your rating, and is not outrageously unstable to my eyes.

Glicko says that you are 6k +/- 1.4k.

In the past 6 months the worst rating on the graph is 7.4k and the best is about 5.7k.

Thats moderately stable, and consistent with the uncertaintly Glicko says that there is.

In the last 6 months you have also played substantially more games against stronger opponents than weaker opponents (43 vs 24). That imbalance contributes to uncertainty in your rank.

I’d be curious about which games produced “unpredictable” results?

To me, the biggest source of this feeling is the fact that the graph has one dot per day, so you can’t actually see the effect of a single game on days where you finished more than one. Knowing this can help explain some otherwise curious looking effects on the graph.

GaJ

Wulfenia · December 11, 2017, 9:51am

@GreenAsJade
Yes, it is “not outrageously unstable” during the last 6 months because I mostly stopped playing during the last months. Any rank will be stable without new games. This is not evidence that Glicko ratings are stable if people stop playing. Also, the only reason that I have played more games against stronger opponents after the Glicko change is that after the Glicko change my ongoing correspondence games were suddenly against stronger opponents.

The rank is very unstable if you look further back till autumn 2016 and at the extremal values which were not like that before the change in both directions.

Eugene · December 11, 2017, 10:09am

In the period I looked at, the most recent 6 months game history, you played 60+ games.

That seems like more than enough to get a good sample for a ranking, and during that most recent time your ranking does seem to have stabilised. Your “mostly stopped” rate is faster than my normal rate of games

In your earlier history you stopped playing between June and October 2016. When you resumed, the results do look unstable for the next month after that. I guess that is not so surprising after a period of inactivity, that you would play less predictably?

I could write more - there are other interesting observations in your record, but it would become an “argument of opinions”.

It’s only my opinion, but I didn’t see anything that made me personally think “this is broken”.

GaJ

Wulfenia · December 11, 2017, 10:45am

May 7: 2,9 ± 1,3 ( a value not attained in the old rating, there is an even higher one a year back)
Sep 23: 7,4 ± 1,5 (a value not attained in the old rating for quite some time)

This deviates strongly from the old rating in both directions during a time where my playing skill did not change a lot.

And of course, there are reasons why Glicko calculates these ratings, I am not suggestiong that these are random numbers, the point is that it is a bad calculation because it fluctuates wildly and my opponents have also been of correspondingly fluctuating skill after the change. Any answer that basically says “Oh, but it would not fluctuate if you would but play against other people.” misses the point that I am finding my games in exactly the same way as before (putting out challenges with rank bounds and playing against anyone who accepts). If Glicko means that I have to select opponents in a special Glicko-approved way it means that Glicko does not work for me. And it is not probable that I am the only person concerned.

I really get the impression that people here base their argument on the assumption Glicko is fine. “Graph looks unstable?” Well, let’s see you played too many stronger opponents, too little stronger opponents, too few games, too little games, you did not play on ogs for three months, so obviously, you would play go less predictably, you played a lot, so obviously, your rank would change a lot. It is quite easy to invent reasons if you know the desired outcome of the argument.

So, let me ask you a question: What exactly would convince anyone here that Glicko ratings do not work well? Is there anything at all?