Am I alone in feeling like OGS is the hardest place to find an even game?

(Sorry to bump this old thread, but I didn’t reply before, and I was thinking about this older thread because of this newer one. I wanted to reply here because there weren’t many people agreeing with the OP at the time, but the OP’s experience isn’t too far off from what I’ve seen myself.)

Anecdotally, in the last three months, my overall rank has varied between 0.7k and 5.5k (mostly getting weaker, but it was getting stronger in the three month period before that). I think my “wins/losses against stronger/weaker opponents” have looked reasonable at all times, but that doesn’t mean that opponents are getting good games against me.

I think it’s a combination of factors:

  • The ranking system is aggregated. Coming back to Go after a long period away, I focused on 9x9 to help with my reading, and improved a bit. I’ve recently started playing 19x19 again (lots of atrophy there…) and 13x13 (which I basically never played before).
  • My own playing strength is volatile (even within a game) because I play mostly correspondence. Sometimes I take some time to read. Sometimes life demands I treat every game like blitz.
  • The current OGS ranking system is just inherently volatile. It’s an implementation of Glicko-2, which is period-based (adjust ratings based on average playing strength in a given time period), but it has stripped out time. I believe this makes OGS ratings bounce around a lot more than Glicko-2 should.
  • Handicap games are relatively uncommon on OGS. Typically the rating system can learn more from a game with auto-handicap than from an even game.

All that to say, I think the current metrics of “black win percentage” (what currently gets tracked for goratings experiments) and “wins/losses against stronger/weaker opponents” (the pie chart), while useful, are insufficient, and don’t fully describe how well the ratings system is working. It can be simultaneously true that (1) a player’s average rating over a 3 month period is correct and (2) the rating was usually multiple stones off from the correct rating at any given moment.

Also, I have some thoughts about how to improve it in the other thread.

5 Likes