I meant, this might not be true, but it is how rank difference should be measured, since it gets rid of the arbitrariness of handicap stones (which are frankly a different game being played). Especially, handicap stones are not a good measure of rank difference between very strong players.
Using winrates, the difference between 15k and 13k should be as large as the difference between 5d and 7d.
We’re actually already using this system: it’s the Elo system, and with it the Glicko(2) system.
And that dan ranks traditionally are based on handicap stones is not completely justified either. For instance, with professional players, dan rank does not correspond to handicap stone difference (even if we ignore weird promotion rules based on winning certain titles): the distinction is too fine to give whole stones. And, as I mentioned above, in Japan dan ranks have become a lot more spacious (meaning the difference in rank is sometimes less than the needed handicap stones), due to dan ranks being sold for money.
Well it isn’t, because OGS converts between Glicko2/Elo ratings and ranks. And the conversion formula is not 150 Elo = 1 rank (150 Elo means 70% winrate). It’s more like 50 Elo (roughly 60% winrate) = 1 rank (which corresponds to observed winrates between consecutive EGF ranks in the kyu range).
70% winrates (=150 Elo gaps) between consecutive ranks are typically observed around 5d EGF.
65% winrates (=100 Elo gaps) between consecutive ranks are typically observed around 1d EGF.
I used 70% / 85% pretty much as arbitrary numbers and I have no idea if it’s anywhere near what is measured in real games. Make it 55% or 90%, same difference for what I was trying to say.
My point is that Elo is something that is measurable regardless of the strength of a player. It’s possible to make a good prediction of the winning chances between players by simply subtracting their Elo score. It doesn’t depend on the actual strength of the players.
Ideally handicap stones correspond to a fixed difference in Elo. They don’t, though. That was what I was trying to say.
I mean could you come up with a rating system that achieved this?
I’m not sure what the requirement would be for this to work. You assign numbers to people to give them relative strengths and the difference between their numbers is some indicator of the likelihood one beats the other.
Like there should be some probably function p(R_b-R_A,x_1,…,x_n) that only depends on the difference in ratings and some extra tweakable parameters. Maybe one parameter (x_1) is the number of handicap stones (or komi if you wanted to do it that way), and p(0,0,…) =0.5 ideally, p(100,1,…)=0.5, … p(100*k,k,…)=0.5.
I might look into how some of these rating systems were devised if I get a chance.
So you consider 7d-9d amateur as overlapping with professional, as the EGF ranking system does. @Clossius1 considers only 9d amateur as overlapping with pro level, as is the case on Asian go servers. This is already quite a discrepancy.
I’m curious which amateur ranks the AGA sees as overlapping with professional strength. And what about OGS?
We could make an international standard by anchoring everybody to KataGo by the handicap they need to score 50% against it.
We may guesstimate that KataGo needs about 2.5 stones handicap (or 21 points reverse komi) against a hypothetical perfect player. I’m using 14 points komi as the distance between ranks (the value of a full handicap stone). This matches quite well with KataGo’s estimates of komi values of handicaps.
So when we say that the perfect player is level 0 (and higher levels are weaker), KataGo may be about level 2. Yeonwoo 1p may be about level 5 (she seems to need 3.5 stones handicap to win 50% against KataGo). A world top pro may be level 4 (needing about 2.5 stones handicap = 21 points reverse komi against KataGo?). An EGF 1d may then be about level 11 (needing about 6.5 stones handicap against Yeonwoo?) and a 29k may be about level 40.
With this ranking system there would be no arbitrary boundary between kyu and dan.
But single digit levels would still carry prestige (even though the decimal system is quite arbitrarily linked to the number of fingers we have). Single digit level would probably carry even more prestige than current “dan”, because level 9 would be about mid dan EGF. So reaching level 9 would be even tougher than reaching “dan”.
Determining if you are level 9 is as easy as firing up KataGo (giving KataGo at least 10k playouts per move, probably requiring a machine with a decent graphics card) and beating it about 50% of the time with a handicap alternating between 7 and 8 handicap stones (or 91 points reverse komi).
One would still have to use some maximum handicap of 9 stones (~ 120 points reverse komi) between players, because at some point the handicap system breaks down. Even the perfect player cannot win when giving 361 reverse komi under area scoring.
So this method can only be used directly to determine levels 10-0. Beyond that, levels would just be determined by handicaps against players with an established level (just as normal).
I think you’d need to put a comma between R_A and R_B, like p(R_A, R_B,x_1,…,x_n), because it does not depend only on rating difference, but also on the ratings themselves.
I know elo already does this how does it incorporate knowledge about the game itself? How do I add a parameter to it to say that games aren’t even games, they have x handicap or komi? I suppose you could just define a certain set of ratings to be x kyu and y dan, and adjust players rating by say m*100 hundred points to account for m handicap stones (assuming rating gaps work in 100’s) before inputting into the formula.
I’m just trying to understand @Vsotvep’s point. I will probably just sit down and look at these rating systems, slightly worried I’ll get sucked into playing with the github data.
It’s not like the asian servers are filling up with bots pushing people out of 9d rating spots.
In general, gaps between handicaps would not be equal to 100 Elo in OGS (or any other go rating system AFAIK).
Yes, in the EGF system, gaps between handicaps are 100 GoR points (by definition), but GoR points are not Elo points. From my analysis of the EGF data, gaps between handicaps in terms of Elo varies from about 35 Elo around 20k EGF, via about 100 Elo around 1d EGF to about 200 Elo around 8d EGF.
I can give the function that is predicted by the EGF system and a function that matches the actually observed data in the EGF tournament games (they are different), but I suppose you’re more interested in the OGS version.
well, there is a way to do this, but you’d probably wanna go back to the simpler version of the model: the Bradley-Terry Model
so the Bradley-Terry Model essentially says the chance of A being observed better than B is A/(A+B), interestingly enough this is where the elo formula comes from.
If you replace A with 10^(r1/400) and B with 10^(r2/400), you get 10^(r1/400)/(10^(r1/400)+10^(r2/400)), and then if you multiply to and bottom by 10^(r1/400) you get 1/(1+10^(r2/400)/10^(r2/400)) which equals 1/(1+10^((r2-r1)/400))
so let’s go back to the A & B units for simplicity, with the memory that A=10^(r1/400) and B the same for r2
and we’ll say that A is the one getting the handicap benefit h, which we will suppose is a function of rank as h(r1). Then the proper formula for the update is (A+h(r1))/(A+h(r1)+B), which can be done by having a similar function increasing the number of ratings points of r1 for the purpose of calculating expectation
I get a good fit for EGF game data with B( r ) = -6 * ln(3200-r) where r is rating expressed as GoR (with GoR 2100 = 1d and 100 GoR gaps between ranks / handicap stones).
A conversion between Elo and B would be
Elo( r ) = B ( r ) * 400 / ln(10) + C
C is an arbitrary constant. C = 9100 seems to give a fairly good match with Elo ratings from https://www.goratings.org/en/
Small mistakes like this are inevitable in any complex software. The big picture is that the OGS rating system is very good.
The impact of this minor error is limited in time - it will soon be fixed in the code, and in a few weeks from now, ranks will have adjusted and the glitch will be forgotten.