Thanks to @flovo we’ve discovered a couple of issues with the rating system. The first, in the production system I royally messed up and forgot to account for handicaps. Big whoops. So, everyone who was noting that ranks were feeling really strong, yeah that’s why. Even if you yourself don’t play handicap games, your peers who might are going to get pulled up or down by it, which is going to tug your rating in the same direction. @S_Alexander even pulled some stats showing that a lot of dans had lost rank, this is the cause. What about those pretty graphs I made you say? Well, I broke the cardinal NASA rule - test what you fly and fly what you test. I had my test code that I would run all of the data through for quick analysis, and then the production system which ties into the database and all of that, and when I ported things over I somehow missed that. My spot checks seemed a-ok, but I didn’t do a proper test to make sure the data was aligning in all cases.
The second failure he found was that we’ve had a long standing issue that was causing our deviation to not converge as much as it should, which caused an increase in rating volatility. So, when you play a few games and your rating changes too much, this is why. It still bounced around the right rating, so it still functioned as an adequate metric for matching, but it was too volatile.
As of a few hours ago the handicap issue has been fixed going forward, however the effects are not applied retroactively yet. I’m not going to alter the deviation issue quite yet because I want to fully understand how that effects things before moving forward. We will be repairing the ratings retroactively once again, but it’ll be a couple of weeks.
There’s the other user experience issue of the sliding window system allowing for a drop in your rating even after a win. This seems to happen somewhat frequently and is both confusing and demoralizing, so I/we are going to go back to the drawing board a bit to come up with an alternative system that is more intuitive in that your rating should always go up on a win and down on a loss, even if only by a little.
The plan now is to separate out the ratings code into an open source module that I’ll be putting up on GitHub in the next day or two (I’ll reply with a link to it in this thread for those that are interested). There we’ll work on the next iteration of the rating system in the open, anyone who wants to help us figure out the best way to do things or play spot the bug is welcome to help out. It’ll have a full dump of the production data so we can analyze how different approaches perform against the approximately 12M rated games we have. Once we have a solid strategy, we’ll validate the implementation strongly and then it’ll be exactly what we use in production.
Sorry for the churn, and the yet-another-rating-blunder.