Unstable ranks?

BHydden · December 11, 2017, 11:12pm

People having a rank that doesn’t appear to line up with their skill over a period of several ranked games with no outstanding features (such as consecutive wins / losses)

To briefly address the fluctuation in the back propagation, keep in mind that everybody got back propagated but who you played and when didn’t change. So at the time the game might have appeared to be a 4k playing another 4k but using the new system at that point in time those same players may have been 2k vs 6k thus the result of such a game has a bigger impact on both players’ ranks than they experienced the first time around.

Likewise, the system now recognises and distinguishes between a 4k with ± 150 confidence and a 4k with ± 60 confidence. Also affecting how rating changes for both players after a game.
(a player who is currently 4k but working his way up to his true rank of 2d is not treated the same as a player who has been 4k for his last 1000 games)

Conrad_Melville · December 12, 2017, 3:03am

I’m jumping in this again because there are certain aspects of Glicko-2 that some of us, including me, don’t understand, or that others don’t understand. I really don’t know which it is. If I am mistaken in what I say here, I am very eager to be corrected, and I expect that others would likewise benefit. Or if I am correct in what I say here, that would directly benefit others. The very fact that there is so much discussion of this topic indicates widespread confusion on the parts of some of us–whomever that may be.

Background: Others may recall that OGS used to lag other servers in rankings. I think there was a page in SL on this, which showed that OGS was about two ranks tougher than KGS, for example (i.e., a 1k KGS would be about 3k OGS). I recall hearing that one purpose of G-2 was to bring us more into line with the other servers, I have heard people say that this has been achieved. More explicitly, the G-2 announcement speaks of wanting to overcome the problem of “slow-moving ratings” under the Elo system. This suggests that the issue of unstable or volatile ratings may just be a natural contrast to the former, slow-moving system.

The heart of the issue seems to me to be how the batching and averaging of the ratings works (I alluded to this in my previous post). Now I may be completely misunderstanding this, because the G-2 announcement was very unclear about it–so if I am, please be kind. As I interpret the last paragraph of the G-2 announcement, one’s rating is an average of the “base” rating plus the “current” rating. The current rating is the most recent 10-15 games (later it says simply 15, and so will I for convenience), with each game tallied immediately. This leads to computing “your new ‘current’ rating by applying the results of those games to your base rating.” This is what I understand to mean an averaging of the current batch with the base batch. Furthermore, “Once 15 games [have] been played, the ‘current’ rating is finalized and the list is reset.” I understand this to mean that the current batch then become the new base while a new 15-game batch begins to accumulate. So, after a new user gets beyond the arbitrary 1500 initial base rating, the ratings consist of about 30 games: a 15-game base and a current batch that accumulates to 15 games, with each current batch becoming the new base when it is finalized. In short, the ratings roll over approximately 30 games, meaning that it will be very sensitive to one’s current strength.

Omitting games that are very old (when one is usually weaker) obviously increases the speed of ranking up. There can, however, certainly be disagreement about the proper size of the batches. Is 30 games the magic number? I have no idea.

Also, I think it is very important in this discussion to remember that there is a difference between the theory of the system and the implementation. Most of the discussion I’ve seen focuses on the theory, as if it were the source of problems, but the problems might just as well lie in the implementation (i.e., bugs). This distinction is true even if my entire understanding outlined above is wrong.

Finally, I am not arguing pro or con G-2. I do not know enough about this subject to make that evaluation. However, I do know linguistic ambiguity, which is one of the problems in understandng all this.

BHydden · December 12, 2017, 3:24am

To my understanding, everything you have said is more or less accurate. I would add that alongside the 15 games batching process, there is also a complementary monthly batching process (for the distinct purpose of keeping both rating and confidence accurate for all players who complete less than 15 games per month).

If this has been unclear, what that means is that each person’s ‘current’ rating is updated game by game, based on both the performance of themselves and their opponents from any point in the current batch, and either after 15 games or the passing of a calendar month (assumedly on some arbitrarily assigned date each month site wide) this ‘cuurent’ rating becomes the new ‘base’ rating and the process begins again.

Hopefully I too have assisted in clarifying the issue for all concerned.

system · March 13, 2018, 11:24am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.