2020 Rating and rank tweaks and analysis

I hear you, and I understand that some established players will turn their nose up at the thought of having a combined rank. We will of course continue to maintain all 16 ratings, we do track “pure live 19x19” ratings after all, but the question is which one do we use? I think the data backs up that for online ratings, the combined rank is more useful on average for producing handicaps than any of the individual ratings.

At the end of the day, Dans are important, but getting good fun games between players is vastly more important. And I feel like when you’re at the dan level and they are interested in per game speed and board size stats, we have those readily available that they can look up for any player, so the data is there, it’s just that we don’t use it to automatically pick handicaps.

12 Likes

So, the short summary about how ogs rating system works:

Every time you finish a ranked game, system looks at your last 15 ranked games (or ranked games from last 90 days if thats less than 15) and what are the current ratings of your opponents and who won, then calculates a totally new rating for you based on that info.
But in the case of you won your last game AND if that recalculated rating would be lower than it was before your last game, your rating will stay the same for the time being.

Did i understood it right? I assume ton of ppl are gonna get into chat and ask “why does my rank drop so much when i lose / why didnt my rank change even if i won a game / why does the ranking system behave like it does?” and that feels like good simple explanation xD

3 Likes

Yes

No. We used to do this, and yes it caused a some commotion when their rating when down when they won a game. In the end it didn’t have that much affect on the ratings, less than I thought it would. It wasn’t particularly harmful mathematically speaking, but the user experience was not great, that’s the primary reason for dropping it.

Now, we store the historical rating of your opponents when they played the game, and your ratings are computed by looking at what your rating was 15 games ago, pluggging it in with all of the opponent ratings and wins/losses into glicko2, and using that value as your new rating, which seems to perform quite nicely.

11 Likes

Oh, so in a way, the new system actually resembles traditional elo-ratings more than what we had in 2017-20? But with the major difference that where elo uses your rating 1 game ago as a base rating and modifies that based on your last game’s outcome and opponents rating, the ogs system uses your rating 15 games ago as the base rating and modifies that based on your last 15 games.

(And i guess non-linear rating->rank conversion and having ranges with uncertainty-deviance (thanks alex for reminding me) instead single exact value also major difference to elo)

But was i correct with the part “But in the case of you won your last game AND if that recalculated rating would be lower than it was before your last game, your rating will stay the same for the time being”?

1 Like

Glicko also has deviation and volatility so it adjusts quicker unlike Elo. One of the major reasons we introduced Glicko if I recall correctly to lift the burden from moderators adjusting ratings.

Also, your last sentence is brain-breaking.

1 Like

is still irritating. the graph seems to “go with” the game result; the actual rating now is actualized per game, but I win it falls, I lose it raises ?!

2 Likes

When you win your rating is expected to increase, when you lose it’s expected to decrease. I’m pretty sure that’s going to be true.

Edit: actually I think it can go down even on a win, if there’s a downward trend in your recent history, glicko2 is complicated (but really good)

3 Likes

I sometimes (and especially since yesterday :slight_smile: ) check on a game per game base and it s definately true;
I (12k) just won against a 10k and the rating dropped then I lost against a 13k and it raised

1 Like

The new rating system placed me way too high, 3 ranks higher than I was before. As a result, the games just got noticably harder (°_°’)

1 Like

dont get me wrong, I dont care at all :slight_smile: it s jut hard to follow hehe

3 Likes

Hello. Just to clarify, these termination-api tables are now obsolete?
https://online-go.com/termination-api/player/109488/rating-history

1 Like

If data proves 19x19 rating is less suited to calculate 19x19 handicap than the rating that includes 13x13 and 9x9 performance, it sounds like either the data or the rating system is broken. So I’d assume you are saying data proves the combined rating/rank is more valid to be used singly on all 19x19, 13x13 and 9x9 handicaps, than using any of the ‘pure’ ratings. Am I correct?

If I’m correct, then it seems to me the more reasonable solution (for a server that keeps track of all 3 board sizes and all 3 time settings results separately) would be to use 19x19 rating for 19x19 handi, 13x13 rating for 13x13 handi, and use 9x9 rating for calculating 9x9 handicap, no?

What I am saying is that the “combined rank” is abnormal in the Go world, and is very unusual for most uses of a rating system, so that the value of it should be discussed from marketing point of view like S_Alexander argued above. I see his points to be valid, but still a part of me murmurs “Isn’t it like combining Chess and Shogi ratings for the sake of being different?” :stuck_out_tongue:

5 Likes

The discussion is interesting, I’ll try to pitch it in Russian group, see what they think.

2 Likes

The different board size aren’t fundamentally different. They all follow the same rules and utilize the same skills. A Dan player who has never played 9x9 is still expected to win against a mid SDK player who’s used to all boardsizes.

For a player playing all boardsizes, the individual ratings for each board size are expected to adjust slower than for a player with the same skill playing only on board size. With a combined rating we can adjust the rating faster, leading to better matchmaking and handicap.

8 Likes

Use /glicko2-history instead, format is almost identical, two more fields at the end iirc, and we use a 0/1 instead of 0/2 for the outcome

3 Likes

current example overall rating

1511 before these two games

1509 after win agst 11k (-2)
1539 after win agst 12k (+30)

1 Like

Nope, data indicates overall it’s better to use your current overall. I think this makes sense intuitively too, your overall rank tracks your strength whether you go on a 9x9 binge or stick to 19x19 for awhile, play a lot of blitz or correspondence, doesn’t matter, it tracks your current strength better than playing a bunch of 9x9, getting stronger, then going back to 19x19 after a hiatus.

The per size and per speed ratings were fine for the most part too, so no I don’t think there’s a flaw there, it’s just that the overall was slightly better, and since it is also convenient and easy for users to see and use, seems like it’s the right thing to do.

5 Likes

Yep, so basically what’s happening is a lot of fancy glicko2 math that tries to place you correctly considering the past 15 matches. Your most recent game is only one data point in the trend computation that we’re looking at.

Also, I was wrong when I stated that you shouldn’t go down after a win. Since we use your rating 15 games ago as a starting point, if that point was notably lower than the 16th game, then you might still very well go down after a win… unfortunate, but I think that’s just glicko2 for ya. Likewise, two successive wins is going to amplify the points a bit, because glicko2 math is cool like that, tries to adapt to your current strength pretty quick. If you keep winning game after game, it’s going to get your ranking up pretty fast to get you to those opponents that can finally beat you.

6 Likes

There are many OGS players who play only one of the 3 sizes. For these people, the rating does not adjust faster. So, while the first half of your statement is true “on the average”, the existence of this group of people is making anoek’s effort to adjust how quickly ratings should reflect the recent results more difficult, or less accurate. Thus the latter half of the statement may be untrue for this reason.

1 Like

current example overall rating

1511 before these two games

1509 after win agst 11k (-2)
1539 after win agst 12k (+30)

Although this feels weird, the principle is much the same as things like the entry system in Tennis. Take imaginary player Bob Racquet, currently the world number one and favourite to win Wimbledon, and 600 points in the entry system ahead of his nearest rival Johnny Strings. Johnny is currently injured and can’t take part this Wimbledon, and Bob is a clear favourite. Sadly, he only makes the final where he loses, and suddenly ends up world #2. How on earth can this happen?

Essential the entry system is a rolling 12 months. In the same week the previous year Bob won Wimbledon and poor Johnny missed out then too. So even though being a losing finalist gained Bob 1200 points for 2nd place, he lost 2000 points as the previous year’s result got wiped off the system, so had a net loss of 800 points despite coming second.

Rolling systems are actually very reliable over time, but on changes after individual results they can create weird looking outcomes.

8 Likes