tl;dr - Expect a notable bump in your rank. The lower the rank you are, the larger the increase. This transition is planned to take effect around 2021-01-28T01:30:00Z .
Summary of the update:
We are attempting to align our low dan ranks to be comparable to the EGF and AGA low dan ranks. Currently we are projecting that a 1d OGS rank will be about .7 stones weaker than an EGF 1d and about 0.8 stones stronger than an AGA 1d. These numbers will be re-evaluated once the dust settles, but we believe we should end up fairly close to the goal of being within a stone of each system, on average.
We fixed a volatility bug that was making ranks jump around more than they should
Because of #2, we removed our sliding windowed rating system, which means things will be a bit more intuitive now - if you win, your rating will go up, if you lose it’ll go down.
The rating to rank formula has been updated to be
ln(rating / 525) * 23.15. This update retargets our dan range to align roughly with the AGA and EGF ranks, and widens our rank bands a bit, which has a marked improvement on handicap consistency through ranks as well as eliminates the “forever 25k” problem we had.
Speed and size specific ratings should be notably more meaningful now and should be more or less on the same scale as the overall ratings.
AGA and EGF low dan alignment
One major focus on this improvement was to align ourselves in some way with the AGA and EGF. Generally speaking, an EGF rank is about 1.5 stones stronger than the equivalent AGA rank, so we aimed in the middle, targeting a spot inbetween in the 1d - 3d range. So, if you’re within that range on OGS, you should be within about a stone or so in either organization.
However our SDK and DDK, and high dan ranks, diverge a bit more from both organizations. Generally speaking you will have a higher rank on OGS than either organization in the low SDK and DDK ranks. We feel this is justified and more correct for us given the handicap win rate analysis for both games here and at the organizations.
After the update and the dust settles, we’ll followup with a more complete mapping between OGS and the various organizations and servers we have data for. If you haven’t already, or if your ranks have changed recently, please take a moment to provide your rank information so we can build the best possible rank conversion table.
Volatility bug and ditching windows rating calculations
Thanks to the due diligence and persistence of @flovo, it was discovered that the OGS glicko2 implementation had a notable bug in how we used the volatility parameter. In short, we had a line of code that was
phi_star = sqrt(player.phi ** 2 + new_volatility * 2) and should have been
phi_star = sqrt(player.phi ** 2 + new_volatility ** 2). The difference is that
** means “raise to the power of”, where as
* is just multiply by. So, since
new_volatility usually has values somewhere around
0.06, the value we were supposed to be adding was somewhere around
0.0036, and instead we were using somewhere around
0.12, so we were off by a factor of about
Now mind you, I tested this code once upon a time and it was correct once upon a time. However, at some point I accidentally deleted a character (which is sort of easy with my editor of choice, vim) and because nothing exploded and generally speaking, looking at one game change, the results aren’t that crazy, I didn’t notice it. For years. That bug had been with us since we introduced Glicko2.
Instead of noticing this, I unknowingly compensated for it by implementing fancy windowing systems, first the 15 game window system, later a sliding window system. These systems helped combat the volatility bug to produce results that seemed reasonable, and by and large worked out. However, as it turns out, without that bug, dropping windowing all together produces really good results too (better than bugged window versions) and is a lot more intuitive. Bugless windowed versions work well too, but there’s some drawbacks, especially with things like sliding windows, where sometimes you win and your rating goes down, or vice versa. They also have a tendency to vault players up or down in ranks when you have win or loss streaks, which can be problematic with auto-matching. So, now that that volatility thing is fixed, we’re ditching windowed ratings all together and I think folks will be pleased with the results, and in general the system should be much more intuitive.
For fun, here’s some before/after screenshots that highlights the effect of the volatility bug:
Updated ranking formula
Along with aligning our low dan ranks, we want our ranks to be as useful as possible. One rank difference should equate to being a one stone handicap difference on a
19x19 board, regardless of what rank you are, or the rank difference. While our old rating to rank formula achieved this for a decent percentage of the population, it was not adequate in the lower ranks, and instead placed a notable percentage of our population into a perpetual
25k rank. So, we ran a whole lot of simulations with our entire game history to came up with a better mapping that aligned our ranks well, gets rid of the
25k problem, and produces fairly consistent handicap win rates across the board.
The key things to note in this data are:
- The new system was able to consider 25% more games (labeled
Samples) as being suitable for analysis than the old system. The criteria for this was that the game was played between two non provisional players, and that the games’ handicap was set appropriately. What I inferred from this is that we found a suitable rank for a lot more players, and I don’t believe it came at a sacrifice in good placements for players that didn’t have a problem.
- The black handicap win rates hover pretty consistently around 43.3% for the vast majority of the games, which is pretty good I think (The EGF’s win rate for 1 handicap games is somewhere between 42 and 43%, so comparable)
(note, gray areas are where we don’t have enough data for anything meaningful, the cutoff here was a somewhat arbitrary 100 games)
Speed and size specific ratings
It’s been a known issue that you can’t really compare the different ratings on your profile. Your overall rating had pretty much nothing to do with your live 13x13 rating for instance. This is because they were computed using completely independent rating pools.
@flovo came up with the great idea of using your opponents overall rating when updating a sub-rating, this has the effect of the rating pools converging into the same scale as the overall rating pool, so the individual ratings should be somewhat comparable now.