I’m jumping in this again because there are certain aspects of Glicko-2 that some of us, including me, don’t understand, or that others don’t understand. I really don’t know which it is. If I am mistaken in what I say here, I am very eager to be corrected, and I expect that others would likewise benefit. Or if I am correct in what I say here, that would directly benefit others. The very fact that there is so much discussion of this topic indicates widespread confusion on the parts of some of us–whomever that may be.
Background: Others may recall that OGS used to lag other servers in rankings. I think there was a page in SL on this, which showed that OGS was about two ranks tougher than KGS, for example (i.e., a 1k KGS would be about 3k OGS). I recall hearing that one purpose of G-2 was to bring us more into line with the other servers, I have heard people say that this has been achieved. More explicitly, the G-2 announcement speaks of wanting to overcome the problem of “slow-moving ratings” under the Elo system. This suggests that the issue of unstable or volatile ratings may just be a natural contrast to the former, slow-moving system.
The heart of the issue seems to me to be how the batching and averaging of the ratings works (I alluded to this in my previous post). Now I may be completely misunderstanding this, because the G-2 announcement was very unclear about it–so if I am, please be kind. As I interpret the last paragraph of the G-2 announcement, one’s rating is an average of the “base” rating plus the “current” rating. The current rating is the most recent 10-15 games (later it says simply 15, and so will I for convenience), with each game tallied immediately. This leads to computing “your new ‘current’ rating by applying the results of those games to your base rating.” This is what I understand to mean an averaging of the current batch with the base batch. Furthermore, “Once 15 games [have] been played, the ‘current’ rating is finalized and the list is reset.” I understand this to mean that the current batch then become the new base while a new 15-game batch begins to accumulate. So, after a new user gets beyond the arbitrary 1500 initial base rating, the ratings consist of about 30 games: a 15-game base and a current batch that accumulates to 15 games, with each current batch becoming the new base when it is finalized. In short, the ratings roll over approximately 30 games, meaning that it will be very sensitive to one’s current strength.
Omitting games that are very old (when one is usually weaker) obviously increases the speed of ranking up. There can, however, certainly be disagreement about the proper size of the batches. Is 30 games the magic number? I have no idea.
Also, I think it is very important in this discussion to remember that there is a difference between the theory of the system and the implementation. Most of the discussion I’ve seen focuses on the theory, as if it were the source of problems, but the problems might just as well lie in the implementation (i.e., bugs). This distinction is true even if my entire understanding outlined above is wrong.
Finally, I am not arguing pro or con G-2. I do not know enough about this subject to make that evaluation. However, I do know linguistic ambiguity, which is one of the problems in understandng all this.