Old discussion about OGS deviation

This puzzled me for a long time. But has anything come out of it? Was there any explanation why there’s a difference? From the thread it seems the conversation simply fizzled out.

4 Likes

It would appear that @flovo discovered that OGS’s calculation of the Glicko2 rating is inconsistent with @flovo’s own implementation and another open-source implementation of Glicko2. In particular, OGS’s calculation of the deviation, which represents rating uncertainty and controls the magnitude of rating adjustments, appears to be abnormally larger than the reference implementations. Calculated deviations that are too large lead to adjustments that are larger than necessary, which may increase the perception that ratings fluctuate more than they should.

I would guess that there are two likely explanations for these observations:

  1. There is an bug in OGS’s implementation of Glicko2, which, given the above observations, might specifically occur in the calculation of the deviation values.
  2. There is an error in @flovo’s experiments or analysis.

Either way, the observed discrepancy seems to warrant further investigation. If it is the first case, then fixing the issue might increase the stability of and user satisfaction with the rating system.

Maybe @flovo has already worked with @anoek behind the scenes to investigate and resolve the issue?

4 Likes

We do batch processing (as described by Prof. Glicko), which accelerates rank changes a bit - for better and worse. Because of that, it’s not going to match up with a non-batch glicko2 system.

6 Likes

I used the same batching process, but got much lower deviations for second and later batches.

A batch consists of 15 games (but at most a month). A rating update uses the player rating o at the end of the last batch and current opponent ratings.

The deviation at the start of a new batch on OGS seems to be enhanced in comparison to Glickman’s glicko-2 description in http://www.glicko.net/glicko/glicko2.pdf (by >= 30 points depending on whatever)

6 Likes

Interesting. I promise to delve into it in more detail and grok what you’ve got as I revisit the rating stuff, which should happen q1 or q2 of 2020 I think.

7 Likes