Testing the Volatility: Summary

anoek · January 18, 2023, 1:04pm

Yes we use the code in goratings/goratings/math/glicko2.py at master · online-go/goratings · GitHub almost verbatim, the only differences are around bindings to internal data structures and whatnot, if it’s a concern I’d be happy to send along a diff of the two versions. (I can confirm there’s no algorithmic differences in the diff, just things like we name things “v5_Glick2Entry” instead of just “Glicko2Entry”, stuff like that.

Note we don’t use the gor implementation in that repository at all, that was just something during testing to see how things stacked up against gor.

First off, I think it’s awesome you’re taking a stab at a better rating system. Glicko2 is better than my prior attempts, but as you’ve identified, there are some behaviors of it that are not particularly desirable. I’d welcome a better system if one is presented, noting though that the player experience is very important too (a technically better system that offers a poor experience probably isn’t going to be used, unless the benefits are just so astounding that it makes sense)

Some notes to consider when thinking of different systems, at least as it relates to any possible integration into the OGS system.

There can be a difference between technically better and fair, as perceived by a player. Most players in practice prefer “fair” over marginally better but “unfair” ratings, specifically:
- In general when this topic has come up before, players don’t like their rank drifting when they’re not playing
- When you lose, your rank should go down, when you win, your rank should go up. Staying neutral is fine too for cases where the rank difference is high.
Players want their ratings and ranks to be adjusted immediately, at worst, a small delay. (We have a small delay, but it’s usually sub-second unless things are backed up, this is plenty fast enough). This is a better experience than something like doing a big rating update at the end of either time or games played windows.
We can’t count on users setting a good rating/rank when they join, the system needs to quickly find a suitable rating and rank for them.
The target of having a rank difference be a good estimate for handicap is important
It needs to scale to many millions of games
Computing a rating update can look back a few games if that’s useful, but it can’t consider the whole history as that’s too heavy of a thing to compute for some players and bots.

That’s what comes to mind at the moment. Good luck, let me know what you find, and PR’s are welcome on goratings!