2021 Rating and rank adjustments

tl;dr - Expect a notable bump in your rank. The lower the rank you are, the larger the increase. This transition is planned to take effect around 2021-01-28T01:30:00Z .


Summary of the update:

  1. We are attempting to align our low dan ranks to be comparable to the EGF and AGA low dan ranks. Currently we are projecting that a 1d OGS rank will be about .7 stones weaker than an EGF 1d and about 0.8 stones stronger than an AGA 1d. These numbers will be re-evaluated once the dust settles, but we believe we should end up fairly close to the goal of being within a stone of each system, on average.

  2. We fixed a volatility bug that was making ranks jump around more than they should

  3. Because of #2, we removed our sliding windowed rating system, which means things will be a bit more intuitive now - if you win, your rating will go up, if you lose it’ll go down.

  4. The rating to rank formula has been updated to be ln(rating / 525) * 23.15. This update retargets our dan range to align roughly with the AGA and EGF ranks, and widens our rank bands a bit, which has a marked improvement on handicap consistency through ranks as well as eliminates the “forever 25k” problem we had.

  5. Speed and size specific ratings should be notably more meaningful now and should be more or less on the same scale as the overall ratings.


AGA and EGF low dan alignment

One major focus on this improvement was to align ourselves in some way with the AGA and EGF. Generally speaking, an EGF rank is about 1.5 stones stronger than the equivalent AGA rank, so we aimed in the middle, targeting a spot inbetween in the 1d - 3d range. So, if you’re within that range on OGS, you should be within about a stone or so in either organization.

However our SDK and DDK, and high dan ranks, diverge a bit more from both organizations. Generally speaking you will have a higher rank on OGS than either organization in the low SDK and DDK ranks. We feel this is justified and more correct for us given the handicap win rate analysis for both games here and at the organizations.

After the update and the dust settles, we’ll followup with a more complete mapping between OGS and the various organizations and servers we have data for. If you haven’t already, or if your ranks have changed recently, please take a moment to provide your rank information so we can build the best possible rank conversion table.


Volatility bug and ditching windows rating calculations

Thanks to the due diligence and persistence of @flovo, it was discovered that the OGS glicko2 implementation had a notable bug in how we used the volatility parameter. In short, we had a line of code that was phi_star = sqrt(player.phi ** 2 + new_volatility * 2) and should have been phi_star = sqrt(player.phi ** 2 + new_volatility ** 2). The difference is that ** means “raise to the power of”, where as * is just multiply by. So, since new_volatility usually has values somewhere around 0.06, the value we were supposed to be adding was somewhere around 0.0036, and instead we were using somewhere around 0.12, so we were off by a factor of about 30.

Now mind you, I tested this code once upon a time and it was correct once upon a time. However, at some point I accidentally deleted a character (which is sort of easy with my editor of choice, vim) and because nothing exploded and generally speaking, looking at one game change, the results aren’t that crazy, I didn’t notice it. For years. That bug had been with us since we introduced Glicko2.

Instead of noticing this, I unknowingly compensated for it by implementing fancy windowing systems, first the 15 game window system, later a sliding window system. These systems helped combat the volatility bug to produce results that seemed reasonable, and by and large worked out. However, as it turns out, without that bug, dropping windowing all together produces really good results too (better than bugged window versions) and is a lot more intuitive. Bugless windowed versions work well too, but there’s some drawbacks, especially with things like sliding windows, where sometimes you win and your rating goes down, or vice versa. They also have a tendency to vault players up or down in ranks when you have win or loss streaks, which can be problematic with auto-matching. So, now that that volatility thing is fixed, we’re ditching windowed ratings all together and I think folks will be pleased with the results, and in general the system should be much more intuitive.

For fun, here’s some before/after screenshots that highlights the effect of the volatility bug:

Before:

After


Updated ranking formula

Along with aligning our low dan ranks, we want our ranks to be as useful as possible. One rank difference should equate to being a one stone handicap difference on a 19x19 board, regardless of what rank you are, or the rank difference. While our old rating to rank formula achieved this for a decent percentage of the population, it was not adequate in the lower ranks, and instead placed a notable percentage of our population into a perpetual 25k rank. So, we ran a whole lot of simulations with our entire game history to came up with a better mapping that aligned our ranks well, gets rid of the 25k problem, and produces fairly consistent handicap win rates across the board.

The key things to note in this data are:

  • The new system was able to consider 25% more games (labeled Samples) as being suitable for analysis than the old system. The criteria for this was that the game was played between two non provisional players, and that the games’ handicap was set appropriately. What I inferred from this is that we found a suitable rank for a lot more players, and I don’t believe it came at a sacrifice in good placements for players that didn’t have a problem.
  • The black handicap win rates hover pretty consistently around 43.3% for the vast majority of the games, which is pretty good I think (The EGF’s win rate for 1 handicap games is somewhere between 42 and 43%, so comparable)

(note, gray areas are where we don’t have enough data for anything meaningful, the cutoff here was a somewhat arbitrary 100 games)







Speed and size specific ratings

It’s been a known issue that you can’t really compare the different ratings on your profile. Your overall rating had pretty much nothing to do with your live 13x13 rating for instance. This is because they were computed using completely independent rating pools.

@flovo came up with the great idea of using your opponents overall rating when updating a sub-rating, this has the effect of the rating pools converging into the same scale as the overall rating pool, so the individual ratings should be somewhat comparable now.

107 Likes

Yay! New ranks!

14 Likes

Woo hoo!

2 Likes

Go team Vim!!! :muscle: :wink:

Thanks for the great job!!!

19 Likes

Exciting. Thanks for the all the work on this and other OGS dev. (But maybe use VS Code? :laughing:)

One thing that’s always confused me is the ranks by size and game length. I have only ever played 19x19 correspondence and yet my rank is different in different categories. Does that change/simplify with this update?

4 Likes

Thanks for informing us so thoroughly.

14 Likes

Awesome work and excellent explanation! Thank you @anoek! :muscle:

5 Likes

+1 :wink:

5 Likes

There will still be some variation, but they should converge to be more reasonable

7 Likes

Big thanks to @anoek and @flovo for all the hard work!

6 Likes

Thanks for the detailed and interesting update, sounds like this will be a big improvement. Will the whole rating history be updated for each player or will there be a discontinuity?

4 Likes

Yep the updates are retroactive

11 Likes

Thank you so much! Such great news that the volatility bug was found and fixed. Also respect for being forward about the cause of the bug. :heart:

10 Likes

Hi! Could you please explain how the numerical rank calculated from your logarithmic formula convert to kyu / dan numbers? I have a rating of 2558, which translates to 36.66 in new ranking, where does that put me at? Thanks!

1 Like

great work

1 Like

Im just curious, but how did you came up with these values? By ‘try and error’, or was there some other reason for using these?

2 Likes

Looks like we are going to have lots more Dan level players after this adjustment.

Easier to find live games?

3 Likes

Great work, and great detailed update thanks!

2 Likes

Since you’re asking for feedback to align with AGA/EGF, but I’d imagine there hasn’t been much in terms of rated games in the AGA (or maybe there are and I just haven’t figured out how to do them :stuck_out_tongue: ), how would you like us to fill this out? I’d imagine best guesses, but I personally (and I’d imagine I’m not the only one) self-promote based upon my success on OGS, so I could put in my last tournament rank, but it’s been a while and might have been considered a bit suspect when I had it.

3 Likes