Rating system issues

I’m also a fan of more emphasis on even games. I feel like too much thought is going in to high handicap games and it is hurting the even games. Even games are where competitive tournament play happens and the official rules. Handicap is meant to support a skill difference but it should not negatively impact the even games which is what the players are working towards.

5 Likes

Can we also mention the rank display on ogs?

image

image

While this is a bit side topic I think it is important. Even if you create a fantastic ranking system, players who are looking at their rank, like me, and have no idea what they are looking at creates a negative user experience. Making this easier for us who don’t understand the math I think is important for getting everyone to accept and even love the rank system that you implement.

15 Likes

We already have that. ALL matchmaking is done with RATINGS, not RANK, rank only decides handicap stones and therefore does not impact even games.

It has literally no effect on even games, matchmaking is done by comparing rating points not rank.

The server is confident, within 64 rating points up or down, that you are playing at a 2168 level. anoek’s rating to rank mapping says that 2168 rating points corresponds with 0.8 kyu (~1 kyu) give or take a rank

5 Likes

OGS isn’t handing out some kind of dan certificates though. It’s not like OGS is some highly accredited server by the international go federation, where 7d-9d has to mean professional strength, even if it’d be nice if that were the case.

I’d rather we abolish the dan ranks altogether than pretend some new found wisdom or inner strength brought people across the arbitrarily set threshold from 1kyu to 1dan.

All the in person tournaments I’ve been to have used handicap stones. And why do you think handicap games are negatively impacting even games?

3 Likes

not entirely, because restrict rank does affect things

2 Likes

it refers back to rating points to set the bounds, but even within those bounds, the system will try to match you with the player with the closest rating points available.

4 Likes

For anyone interested in observing or participating in the next rating system evolution, or just adding alternate rating systems for comparison.

we’ve got a 900MB dump of all the game results up until a couple of days ago in there as well for people to play with.

16 Likes

I expected to see handicap compensation somewhere around this area:

Did I miss it? Or it’s not added yet? I do see it in expected_win_probability but that doesn’t seem to be used by the actual glicko2 code part. Also expected_win_probability seems to be glicko, not glicko2?

2 Likes

The way I implemented it was to add it here: https://github.com/online-go/goratings/blob/master/analysis/analyze_glicko2_one_game_at_a_time.py#L43

2 Likes

On the effect of the sliding window : I’ve seen my rating decreases after a victory against a weak opponent, most likely because this new victory caused a victory against a strong opponent to go out of memory. To avoid this, one could think of a mechanism were past games would fade out of memory rather than disappear abruptly. This could be done with a de-weighting mechanism which could be implemented using the rating uncertainty which is already there. One could have some heuristic like : use last 10 games as usual, but for the 10 games before increase the uncertainty of the opponent by 10/(21-i) (where i means it is the ith game in the past). This should smooth out weird changes and help with the volatility. (and if it sounds like an epicycle, yes to does!)

3 Likes

Where are _rank_to_rating and _rating_to_rank? I’m curious to see, but also does it work above 9d and below 25k? Weird things could happen when a very strong player gives an 8d e.g. 3 stone handi. If these routines caps the ranks to 9d, the handicap code might treat this like a 1 stone handi instead of 3 stones.

Similar bugs could happen for 25k players where one is actually stronger and giving stones to the other.

Edit: I suppose the 9d/25k limits are only done when a client wants to display a string? And behind the scenes there is no limit?

1 Like

Rank, and therefore automatic handicap are limited to 25k and 9d, however rating points, while limited, extend well beyond both these limits in both directions.

3 Likes

So… WHR?

2 Likes

Technically Elo/Glicko is supposed to do this through the Expectation vs Actual system, when a game leaves your ratings period in glicko, it doesn’t just disappear – it leaves its effects behind for the next ratings period to do its thing on

3 Likes

BHydden yes I understand the difference between the cosmetic 25k/9d boundaries and the number behind them, but there is room for mistakes in this area, so I was just checking. What I see looks like it’s probably ok:

return rank_to_rating(rating_to_rank(rating) + handicap) - rating

export function rating_to_rank(rating:number) {
    return Math.log(Math.min(MAX_RATING, Math.max(MIN_RATING, rating)) / 850.0) / 0.032;
}
const MIN_RATING = 100;
const MAX_RATING = 6000;

You can see rating_to_rank does have hard limits, but I think they are outside the cosmetic 25k/9d boundary. So this seems more like a safety in case something pathological is happening to ratings.

2 Likes

Correct. the min rating corresponds to something like 100kyu and max something like 30dan lol

3 Likes

This “less than one stone in the four schools” was for pro ranks, not for amateur ranks (which didn’t even exist back then).
Because of this, the EGF system puts pro ranks 30 GoR apart, while amateur ranks are 100 GoR apart. Note that 100 GoR points difference does not mean 100 Elo points difference. 100 GoR point difference stands for one full stone handicap (~ 13-14 komi points difference in skill) for the whole range of ranks. So pro ranks would be about 0.3 full stones ~ 4-5 komi points difference.

Ofcourse, there is variation in the skill between players of a specific pro rank, but by these estimates the difference between a top pro (let’s say 9p) and a marginal pro (let’s say 1p) would be about 8 x 4.5 points ~ 36 komi points difference in skill.

This is very close to the komi value of a traditional handicap of 3 stones according to KataGo.
A 3 stone handicap was also the handicap a 1p would get from a 9p (Meijin) in the Edo period (the era of those four go schools).

3 Likes

If that is true then what do those ranks even mean?
Shouldn’t amateur ranks always be separated by a full stone handicap (= 2 stones handicap with black giving komi or no handicap stones with white giving komi)?

I think an average 9d amateur should be able to give an average 1d amateur (according to the same system) 8-9 stones handicap. If a 9d can only give a 1d in the same system a handicap of 6 stones or so, the 9d rank is nonsensical.

3 Likes

I think ranks are divided by how likely you are to win against an opponent. An equal rank will mean you win about 50% of time, one rank higher, you win about 70%, two ranks 85%, etc.
This can keep being true with dan levels even if the handicap stones do not. A handicap stone is worth more for strong players.

Suppose we have two DDK’s where one wins about 85% of the time from the other, and two Dans where one wins 85% of the time from the other. Then two handicap stones will make the DDK’s play equal games, but with the Dan players the stronger player might start to lose more games on average, since the two handicaps are more valuable than the two ranks difference.

6 Likes

Well you may think so, but it isn’t true when one analyses the relation between winrates and handicaps. Only the 50% is true (obviously).

2 Likes