2021 Rating and rank adjustments

anoek · January 26, 2021, 10:10pm

tl;dr - Expect a notable bump in your rank. The lower the rank you are, the larger the increase. This transition is planned to take effect around 2021-01-28T01:30:00Z .

Summary of the update:

We are attempting to align our low dan ranks to be comparable to the EGF and AGA low dan ranks. Currently we are projecting that a 1d OGS rank will be about .7 stones weaker than an EGF 1d and about 0.8 stones stronger than an AGA 1d. These numbers will be re-evaluated once the dust settles, but we believe we should end up fairly close to the goal of being within a stone of each system, on average.
We fixed a volatility bug that was making ranks jump around more than they should
Because of #2, we removed our sliding windowed rating system, which means things will be a bit more intuitive now - if you win, your rating will go up, if you lose it’ll go down.
The rating to rank formula has been updated to be ln(rating / 525) * 23.15. This update retargets our dan range to align roughly with the AGA and EGF ranks, and widens our rank bands a bit, which has a marked improvement on handicap consistency through ranks as well as eliminates the “forever 25k” problem we had.
Speed and size specific ratings should be notably more meaningful now and should be more or less on the same scale as the overall ratings.

AGA and EGF low dan alignment

One major focus on this improvement was to align ourselves in some way with the AGA and EGF. Generally speaking, an EGF rank is about 1.5 stones stronger than the equivalent AGA rank, so we aimed in the middle, targeting a spot inbetween in the 1d - 3d range. So, if you’re within that range on OGS, you should be within about a stone or so in either organization.

However our SDK and DDK, and high dan ranks, diverge a bit more from both organizations. Generally speaking you will have a higher rank on OGS than either organization in the low SDK and DDK ranks. We feel this is justified and more correct for us given the handicap win rate analysis for both games here and at the organizations.

After the update and the dust settles, we’ll followup with a more complete mapping between OGS and the various organizations and servers we have data for. If you haven’t already, or if your ranks have changed recently, please take a moment to provide your rank information so we can build the best possible rank conversion table.

Volatility bug and ditching windows rating calculations

Thanks to the due diligence and persistence of @flovo, it was discovered that the OGS glicko2 implementation had a notable bug in how we used the volatility parameter. In short, we had a line of code that was phi_star = sqrt(player.phi ** 2 + new_volatility * 2) and should have been phi_star = sqrt(player.phi ** 2 + new_volatility ** 2). The difference is that ** means “raise to the power of”, where as * is just multiply by. So, since new_volatility usually has values somewhere around 0.06, the value we were supposed to be adding was somewhere around 0.0036, and instead we were using somewhere around 0.12, so we were off by a factor of about 30.

Now mind you, I tested this code once upon a time and it was correct once upon a time. However, at some point I accidentally deleted a character (which is sort of easy with my editor of choice, vim) and because nothing exploded and generally speaking, looking at one game change, the results aren’t that crazy, I didn’t notice it. For years. That bug had been with us since we introduced Glicko2.

Instead of noticing this, I unknowingly compensated for it by implementing fancy windowing systems, first the 15 game window system, later a sliding window system. These systems helped combat the volatility bug to produce results that seemed reasonable, and by and large worked out. However, as it turns out, without that bug, dropping windowing all together produces really good results too (better than bugged window versions) and is a lot more intuitive. Bugless windowed versions work well too, but there’s some drawbacks, especially with things like sliding windows, where sometimes you win and your rating goes down, or vice versa. They also have a tendency to vault players up or down in ranks when you have win or loss streaks, which can be problematic with auto-matching. So, now that that volatility thing is fixed, we’re ditching windowed ratings all together and I think folks will be pleased with the results, and in general the system should be much more intuitive.

For fun, here’s some before/after screenshots that highlights the effect of the volatility bug:

Before:

After

Updated ranking formula

Along with aligning our low dan ranks, we want our ranks to be as useful as possible. One rank difference should equate to being a one stone handicap difference on a 19x19 board, regardless of what rank you are, or the rank difference. While our old rating to rank formula achieved this for a decent percentage of the population, it was not adequate in the lower ranks, and instead placed a notable percentage of our population into a perpetual 25k rank. So, we ran a whole lot of simulations with our entire game history to came up with a better mapping that aligned our ranks well, gets rid of the 25k problem, and produces fairly consistent handicap win rates across the board.

The key things to note in this data are:

The new system was able to consider 25% more games (labeled Samples) as being suitable for analysis than the old system. The criteria for this was that the game was played between two non provisional players, and that the games’ handicap was set appropriately. What I inferred from this is that we found a suitable rank for a lot more players, and I don’t believe it came at a sacrifice in good placements for players that didn’t have a problem.
The black handicap win rates hover pretty consistently around 43.3% for the vast majority of the games, which is pretty good I think (The EGF’s win rate for 1 handicap games is somewhere between 42 and 43%, so comparable)

(note, gray areas are where we don’t have enough data for anything meaningful, the cutoff here was a somewhat arbitrary 100 games)

Speed and size specific ratings

It’s been a known issue that you can’t really compare the different ratings on your profile. Your overall rating had pretty much nothing to do with your live 13x13 rating for instance. This is because they were computed using completely independent rating pools.

@flovo came up with the great idea of using your opponents overall rating when updating a sub-rating, this has the effect of the rating pools converging into the same scale as the overall rating pool, so the individual ratings should be somewhat comparable now.

DVbS78rkR7NVe · January 26, 2021, 10:14pm

Yay! New ranks!

seminyoon11 · January 26, 2021, 10:20pm

Woo hoo!

n0w3l · January 26, 2021, 10:23pm

Go team Vim!!!

Thanks for the great job!!!

CitrusRootWeevil · January 26, 2021, 10:27pm

Exciting. Thanks for the all the work on this and other OGS dev. (But maybe use VS Code? )

One thing that’s always confused me is the ranks by size and game length. I have only ever played 19x19 correspondence and yet my rank is different in different categories. Does that change/simplify with this update?

snakesss · January 26, 2021, 10:30pm

Thanks for informing us so thoroughly.

mlopezviedma · January 26, 2021, 10:31pm

Awesome work and excellent explanation! Thank you @anoek!

BHydden · January 26, 2021, 10:33pm

+1

anoek · January 26, 2021, 10:33pm

There will still be some variation, but they should converge to be more reasonable

shinuito · January 26, 2021, 10:45pm

Big thanks to @anoek and @flovo for all the hard work!

alemitrani · January 26, 2021, 10:49pm

Thanks for the detailed and interesting update, sounds like this will be a big improvement. Will the whole rating history be updated for each player or will there be a discontinuity?

anoek · January 26, 2021, 10:50pm

Yep the updates are retroactive

Keep_Strong_Ukraine · January 26, 2021, 10:53pm

Thank you so much! Such great news that the volatility bug was found and fixed. Also respect for being forward about the cause of the bug.

Sistina · January 26, 2021, 10:57pm

Hi! Could you please explain how the numerical rank calculated from your logarithmic formula convert to kyu / dan numbers? I have a rating of 2558, which translates to 36.66 in new ranking, where does that put me at? Thanks!

FlipFlop · January 26, 2021, 10:58pm

great work

_KoBa · January 26, 2021, 11:00pm

Im just curious, but how did you came up with these values? By ‘try and error’, or was there some other reason for using these?

claire_yang · January 26, 2021, 11:09pm

Looks like we are going to have lots more Dan level players after this adjustment.

Easier to find live games?

Eugene · January 26, 2021, 11:12pm

Great work, and great detailed update thanks!

meili_yinhua · January 26, 2021, 11:19pm

Since you’re asking for feedback to align with AGA/EGF, but I’d imagine there hasn’t been much in terms of rated games in the AGA (or maybe there are and I just haven’t figured out how to do them ), how would you like us to fill this out? I’d imagine best guesses, but I personally (and I’d imagine I’m not the only one) self-promote based upon my success on OGS, so I could put in my last tournament rank, but it’s been a while and might have been considered a bit suspect when I had it.