Rating system issues

Foxy has simplicity but it has a cost too. The cost is absolutely insane number of games you need to play to change anything.

15-20 games is way too much for anyone who doesn’t play go 24/7. As a result the ranks adjust very slowly (and therefore less accurate).

If I recall correctly we switched to Glicko precisely because our ranks were slow to adjust and mods had to change them.

And it can be discouraging too. When you have one game deciding whether you go up or stay on the same level and you lose. You realize you’ll have to win like 15 games in a row to even get to this position again. And I assume people try to do this thing where they pick who they play because of that. Because every opponent worth the same. Even if they have 20-0 game record or 0-20.

Plus, let’s not forget. On Foxy you play only against your level (or ±1 without komi) for ranking. As far as I understand, there’s no ranked games between different levels. They can do that because they have very large player base. We don’t.

In conclusion, regarding foxy-like ranking on OGS: “No, God, please, no”. I love foxy but in a very special way.

I really believe rating-based system where you win - you get some points, you lose - you lose some points is a way to go.

22 Likes

@anoek, I just want to say that I continue to marvel at the work you do for all of us. I am profoundly amazed by the new tweaks and features you continue to add as time goes on. The AI addition was absolutely spectacular and now we are getting an improvement to the rating system which is even better.

I enjoy playing on other servers such as Fox and Tygem but I have to say that I take pride in the OGS rating system. I agree with @S_Alexander that this glicko-based should be kept as it gives OGS a uniquely sophisticated feature that other servers can’t brag about.

Every server has several features that make it unique and desirable and OGS is no different. I wish I could dive deeper into the idea of this rating system but the math is unfortunately too deep for me.

Thanks for everything as always! You have some real talent!

14 Likes

In over 1000 years of Go history, OGS may be in an unique position to do nobody, and no organization, was able to do, ever.

When the weakest and the strongest, and everybody in between, are placed in data proven strength-bands of kyus and dans spaced logically, and the logic and the data backing it up are disclosed for every statistician in the world to examine, OGS Rank will have a good chance to become THE gold standard in the entire Go world.

Please keep in mind what Clossius said is very true that mid to high dans have less than 1 handi-stone separating each rank. It means that mid to high amateur dans can feel undeniable difference in strength of much less than 1 handi stone, just like the kyus can feel between 9k and 10k. The number of serious games high dans play in real life is so much less than kyus and the real life ranking systems almost never had sufficient number of game results to accurately (statistically) assess which high dan is stronger than which high dan.

So the real life high dans have (and prefer) less than 1 handi separating each rank, “despite this difficulty”, may be a strong proof that ‘ideal’ Go ranking system should also have this characteristic.

ps. Hiring Mark Glickman may be too expensive. But luring him into playing Go on OGS may not be that difficult. Anybody taking his class at Harvard?

15 Likes

I would agree with Tokumoto on the “less than one stone”, as even in tradition under the four schools system, dan ranks were usually 1/2 a stone (or less) apart, 1 rank apart meant sen-ai-sen, 2 ranks josen (no komi), 3 ranks sen-ni-sen (alternating no komi and two stones), and so on

11 Likes

Since strength and pairing is primarily controlled by rating points, and ranks are simply mapped onto them at the closest point to represent integer handicap stones… maybe we can fix the issues of the past by altering the distance of dan mappings such that, as far as it is possible, they are all integer handicap stones apart (or as close as we can make it)?

9 Likes

I don’t know if this would work, it’s just an idea.

  • Normally consider the last 15 games.
  • If you win a game and your rating would go down, then consider the last 16 games (the last 15 plus the previous win, which must result in a higher or equal rating), and so on. If it would not go down (which will happen eventually if you are on a winning streak), reset to 15.
  • If you lose a game, reset to 15.
  • And the other way around for losses.
1 Like

Don’t worry too much about breaking things, by the way. It will always happen unless you have a million unit tests, hundreds of acceptance testers and are extremely afraid of breaking things. OGS is still better than all other Go servers by far.

9 Likes

It is certainly worth considering that that might very well be the desired arrangement.

I’ll certainly run that through the number grinder. Instinctively it seems like that would either create a lot of 9d’s or very few 1d’s, depending on where you anchor the rating system. It would also seem to imply that you’d have this progression effect where the last few SDK ranks would be comparatively brutal to get through, then once you hit 1d it’d be suddenly easier and quicker to rank up (or down), as the ranks would be tighter and perhaps more volatile because of that.

4 Likes

I think what matters most is to have better matchmaking for even games. It doesn’t make sense to me to focus on ensuring that a 10k and a 1k can play “the most even ranked game possible”.

7 Likes

I’m happy you are considering this perspective. A few additional notes.

Don’t anchor it at 9D. According to a conversation I had with yoonyoung, getting past 6D should be HARD. Getting to 7D can be done in a year or 2 from 6D but 1D-6D can be done in 6 months or so of a very studious student. Then 8D is apparently as hard to get as 30k to 7D. Or roughly there about. 9D is professional level. Granted this is on Asian servers but I want to stress, it’s better to have too many 1D than 9D. 7d+ is supposed to be grandmaster level for amateurs. The HARD ranks to get.

On another note, 2k/1k is actually supposed to have a barrier. It’s more the last kyu barrier that you get where you have to make every move have positive semi effecient value. It’s true dans don’t play perfect but from my experience dans have a positive value move every move 99% of the time. The difference is have efficient we are. This is not possible to compute however but the reason I mention it is there may be a mathematical delimma around 2k/1k/1d to promote and I want you to understand that it may not just be the system but also the mental barrier there. While I want them to be able to promote if you make it to easy that barrier will be pushed to 1-3D or worse blend with a higher rank where they get crushed. This barrier is why 1D have some recognition and why it’s such a huge milestone.

Just food for thought.

6 Likes

I’m also a fan of more emphasis on even games. I feel like too much thought is going in to high handicap games and it is hurting the even games. Even games are where competitive tournament play happens and the official rules. Handicap is meant to support a skill difference but it should not negatively impact the even games which is what the players are working towards.

4 Likes

Can we also mention the rank display on ogs?

image

image

While this is a bit side topic I think it is important. Even if you create a fantastic ranking system, players who are looking at their rank, like me, and have no idea what they are looking at creates a negative user experience. Making this easier for us who don’t understand the math I think is important for getting everyone to accept and even love the rank system that you implement.

11 Likes

We already have that. ALL matchmaking is done with RATINGS, not RANK, rank only decides handicap stones and therefore does not impact even games.

It has literally no effect on even games, matchmaking is done by comparing rating points not rank.

The server is confident, within 64 rating points up or down, that you are playing at a 2168 level. anoek’s rating to rank mapping says that 2168 rating points corresponds with 0.8 kyu (~1 kyu) give or take a rank

5 Likes

OGS isn’t handing out some kind of dan certificates though. It’s not like OGS is some highly accredited server by the international go federation, where 7d-9d has to mean professional strength, even if it’d be nice if that were the case.

I’d rather we abolish the dan ranks altogether than pretend some new found wisdom or inner strength brought people across the arbitrarily set threshold from 1kyu to 1dan.

All the in person tournaments I’ve been to have used handicap stones. And why do you think handicap games are negatively impacting even games?

3 Likes

not entirely, because restrict rank does affect things

1 Like

it refers back to rating points to set the bounds, but even within those bounds, the system will try to match you with the player with the closest rating points available.

4 Likes

For anyone interested in observing or participating in the next rating system evolution, or just adding alternate rating systems for comparison.

we’ve got a 900MB dump of all the game results up until a couple of days ago in there as well for people to play with.

15 Likes

I expected to see handicap compensation somewhere around this area:

Did I miss it? Or it’s not added yet? I do see it in expected_win_probability but that doesn’t seem to be used by the actual glicko2 code part. Also expected_win_probability seems to be glicko, not glicko2?

2 Likes

The way I implemented it was to add it here: https://github.com/online-go/goratings/blob/master/analysis/analyze_glicko2_one_game_at_a_time.py#L43

2 Likes

On the effect of the sliding window : I’ve seen my rating decreases after a victory against a weak opponent, most likely because this new victory caused a victory against a strong opponent to go out of memory. To avoid this, one could think of a mechanism were past games would fade out of memory rather than disappear abruptly. This could be done with a de-weighting mechanism which could be implemented using the rating uncertainty which is already there. One could have some heuristic like : use last 10 games as usual, but for the 10 games before increase the uncertainty of the opponent by 10/(21-i) (where i means it is the ith game in the past). This should smooth out weird changes and help with the volatility. (and if it sounds like an epicycle, yes to does!)

3 Likes