Rating system issues

Clossius1 · July 1, 2020, 4:40pm

I’m also a fan of more emphasis on even games. I feel like too much thought is going in to high handicap games and it is hurting the even games. Even games are where competitive tournament play happens and the official rules. Handicap is meant to support a skill difference but it should not negatively impact the even games which is what the players are working towards.

Clossius1 · July 1, 2020, 4:44pm

Can we also mention the rank display on ogs?

While this is a bit side topic I think it is important. Even if you create a fantastic ranking system, players who are looking at their rank, like me, and have no idea what they are looking at creates a negative user experience. Making this easier for us who don’t understand the math I think is important for getting everyone to accept and even love the rank system that you implement.

BHydden · July 1, 2020, 11:02pm

We already have that. ALL matchmaking is done with RATINGS, not RANK, rank only decides handicap stones and therefore does not impact even games.

It has literally no effect on even games, matchmaking is done by comparing rating points not rank.

The server is confident, within 64 rating points up or down, that you are playing at a 2168 level. anoek’s rating to rank mapping says that 2168 rating points corresponds with 0.8 kyu (~1 kyu) give or take a rank

shinuito · July 1, 2020, 11:40pm

Clossius1:

Don’t anchor it at 9D. According to a conversation I had with yoonyoung, getting past 6D should be HARD. Getting to 7D can be done in a year or 2 from 6D but 1D-6D can be done in 6 months or so of a very studious student. Then 8D is apparently as hard to get as 30k to 7D. Or roughly there about. 9D is professional level. Granted this is on Asian servers but I want to stress, it’s better to have too many 1D than 9D. 7d+ is supposed to be grandmaster level for amateurs. The HARD ranks to get.

On another note, 2k/1k is actually supposed to have a barrier. It’s more the last kyu barrier that you get where you have to make every move have positive semi effecient value. It’s true dans don’t play perfect but from my experience dans have a positive value move every move 99% of the time. The difference is have efficient we are. This is not possible to compute however but the reason I mention it is there may be a mathematical delimma around 2k/1k/1d to promote and I want you to understand that it may not just be the system but also the mental barrier there. While I want them to be able to promote if you make it to easy that barrier will be pushed to 1-3D or worse blend with a higher rank where they get crushed. This barrier is why 1D have some recognition and why it’s such a huge milestone.

OGS isn’t handing out some kind of dan certificates though. It’s not like OGS is some highly accredited server by the international go federation, where 7d-9d has to mean professional strength, even if it’d be nice if that were the case.

I’d rather we abolish the dan ranks altogether than pretend some new found wisdom or inner strength brought people across the arbitrarily set threshold from 1kyu to 1dan.

All the in person tournaments I’ve been to have used handicap stones. And why do you think handicap games are negatively impacting even games?

meili_yinhua · July 2, 2020, 12:46am

not entirely, because restrict rank does affect things

BHydden · July 2, 2020, 12:48am

it refers back to rating points to set the bounds, but even within those bounds, the system will try to match you with the player with the closest rating points available.

anoek · July 2, 2020, 4:39pm

For anyone interested in observing or participating in the next rating system evolution, or just adding alternate rating systems for comparison.

we’ve got a 900MB dump of all the game results up until a couple of days ago in there as well for people to play with.

KillerDucky · July 2, 2020, 7:15pm

I expected to see handicap compensation somewhere around this area:

github.com

online-go/goratings/blob/13c58b77de0cd8bab0b38f8ff0da1f2870e5d942/goratings/math/glicko2.py#L107




# step 1/2 implicitly done during Glicko2Entry construction


# step 3 / 4, compute 'v' and delta
v_sum = 0.0
delta_sum = 0.0
for m in matches:
    p = m[0]
    outcome = m[1]
    g_phi_j = 1 / sqrt(1 + (3 * p.phi ** 2) / (pi ** 2))
    E = 1 / (1 + exp(-g_phi_j * (player.mu - p.mu)))
    v_sum += g_phi_j ** 2 * E * (1 - E)
    delta_sum += g_phi_j * (outcome - E)


v = 1.0 / v_sum
delta = v * delta_sum


# step 5
a = log(player.volatility ** 2)


def f(x: float) -> float:

Did I miss it? Or it’s not added yet? I do see it in expected_win_probability but that doesn’t seem to be used by the actual glicko2 code part. Also expected_win_probability seems to be glicko, not glicko2?

anoek · July 2, 2020, 9:18pm

The way I implemented it was to add it here: https://github.com/online-go/goratings/blob/master/analysis/analyze_glicko2_one_game_at_a_time.py#L43

dhpmrou · July 2, 2020, 9:21pm

On the effect of the sliding window : I’ve seen my rating decreases after a victory against a weak opponent, most likely because this new victory caused a victory against a strong opponent to go out of memory. To avoid this, one could think of a mechanism were past games would fade out of memory rather than disappear abruptly. This could be done with a de-weighting mechanism which could be implemented using the rating uncertainty which is already there. One could have some heuristic like : use last 10 games as usual, but for the 10 games before increase the uncertainty of the opponent by 10/(21-i) (where i means it is the ith game in the past). This should smooth out weird changes and help with the volatility. (and if it sounds like an epicycle, yes to does!)

KillerDucky · July 2, 2020, 9:39pm

Where are _rank_to_rating and _rating_to_rank? I’m curious to see, but also does it work above 9d and below 25k? Weird things could happen when a very strong player gives an 8d e.g. 3 stone handi. If these routines caps the ranks to 9d, the handicap code might treat this like a 1 stone handi instead of 3 stones.

Similar bugs could happen for 25k players where one is actually stronger and giving stones to the other.

Edit: I suppose the 9d/25k limits are only done when a client wants to display a string? And behind the scenes there is no limit?

BHydden · July 2, 2020, 11:05pm

github.com

online-go/online-go.com/blob/devel/src/lib/rank_utils.ts

/*
 * Copyright (C)  Online-Go.com
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU Affero General Public License as
 * published by the Free Software Foundation, either version 3 of the
 * License, or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU Affero General Public License for more details.
 *
 * You should have received a copy of the GNU Affero General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

import { User } from "goban/lib/protocol";
import { _, interpolate, pgettext } from "translate";

This file has been truncated. show original

Rank, and therefore automatic handicap are limited to 25k and 9d, however rating points, while limited, extend well beyond both these limits in both directions.

Samraku · July 2, 2020, 11:06pm

So… WHR?

meili_yinhua · July 2, 2020, 11:15pm

Technically Elo/Glicko is supposed to do this through the Expectation vs Actual system, when a game leaves your ratings period in glicko, it doesn’t just disappear – it leaves its effects behind for the next ratings period to do its thing on

KillerDucky · July 2, 2020, 11:33pm

BHydden yes I understand the difference between the cosmetic 25k/9d boundaries and the number behind them, but there is room for mistakes in this area, so I was just checking. What I see looks like it’s probably ok:

return rank_to_rating(rating_to_rank(rating) + handicap) - rating

export function rating_to_rank(rating:number) {
    return Math.log(Math.min(MAX_RATING, Math.max(MIN_RATING, rating)) / 850.0) / 0.032;
}

const MIN_RATING = 100;
const MAX_RATING = 6000;

You can see rating_to_rank does have hard limits, but I think they are outside the cosmetic 25k/9d boundary. So this seems more like a safety in case something pathological is happening to ratings.

BHydden · July 2, 2020, 11:45pm

Correct. the min rating corresponds to something like 100kyu and max something like 30dan lol

gennan · July 3, 2020, 2:56pm

This “less than one stone in the four schools” was for pro ranks, not for amateur ranks (which didn’t even exist back then).
Because of this, the EGF system puts pro ranks 30 GoR apart, while amateur ranks are 100 GoR apart. Note that 100 GoR points difference does not mean 100 Elo points difference. 100 GoR point difference stands for one full stone handicap (~ 13-14 komi points difference in skill) for the whole range of ranks. So pro ranks would be about 0.3 full stones ~ 4-5 komi points difference.

Ofcourse, there is variation in the skill between players of a specific pro rank, but by these estimates the difference between a top pro (let’s say 9p) and a marginal pro (let’s say 1p) would be about 8 x 4.5 points ~ 36 komi points difference in skill.

This is very close to the komi value of a traditional handicap of 3 stones according to KataGo.
A 3 stone handicap was also the handicap a 1p would get from a 9p (Meijin) in the Edo period (the era of those four go schools).

gennan · July 3, 2020, 3:05pm

If that is true then what do those ranks even mean?
Shouldn’t amateur ranks always be separated by a full stone handicap (= 2 stones handicap with black giving komi or no handicap stones with white giving komi)?

I think an average 9d amateur should be able to give an average 1d amateur (according to the same system) 8-9 stones handicap. If a 9d can only give a 1d in the same system a handicap of 6 stones or so, the 9d rank is nonsensical.

Vsotvep · July 3, 2020, 3:14pm

I think ranks are divided by how likely you are to win against an opponent. An equal rank will mean you win about 50% of time, one rank higher, you win about 70%, two ranks 85%, etc.
This can keep being true with dan levels even if the handicap stones do not. A handicap stone is worth more for strong players.

Suppose we have two DDK’s where one wins about 85% of the time from the other, and two Dans where one wins 85% of the time from the other. Then two handicap stones will make the DDK’s play equal games, but with the Dan players the stronger player might start to lose more games on average, since the two handicaps are more valuable than the two ranks difference.

gennan · July 3, 2020, 3:16pm

Well you may think so, but it isn’t true when one analyses the relation between winrates and handicaps. Only the 50% is true (obviously).