Huge disparity between the probability of a 1dan defeating a 1kyu between the OGS and AGA ranking systems

John_C · August 21, 2021, 2:25pm

The title says it all really. In an even game on OGS, a 1.00 dan against a 1.00 kyu should win 1.60 games for every game he loses. While in the AGA, a 1.00 dan against a 1.00 kyu should win 4.81 games for every game he loses. This corresponds to a Glicko rating difference of 81 and 273 points respectively.

How does the AGA think the rating gap between these two ranks is over three times larger than OGS does? This does not seem consistent. Can anyone explain this?

Calculations
Here is a simple Python script showing the calculations I did to get the above figures:

import math
e = math.e

#OGS Calculation
print("An OGS 1.00 dan has a rating of: ")
print(525*e**(30/23.15))

print("\nAn OGS 1.00 kyu has a rating of: ")
print(525*e**(29/23.15))

print("\nThis rating difference is: ")
OGS_diff = 525e**(30/23.15)-525e**(29/23.15)
print(OGS_diff)

print("\nThis corresponds to a win ratio of: ")
print(10**(OGS_diff/400))

#AGA Calculation

#These equations come directly from the official AGA PDF I linked
k = 7.5
sigma_px = 1.0649-0.0021976k+0.00014984k**2
RD = 1

print("\nThe AGA win probability of a 1dan vs a 1kyu: ")
win_prob = 0.5*math.erfc(-RD/(2**0.5)/sigma_px)
print(win_prob)

print("\nThis corresponds to a win ratio of: ")
win_ratio = win_prob/(1-win_prob)
print(win_ratio)

print("\nThis corresponds to a rating difference of: ")
AGA_diff = 400*math.log10(win_ratio)
print(AGA_diff)

#Compare OGS to AGA
disparity = AGA_diff/OGS_diff
print(f"\nThe AGA thinks the rating difference between a 1 dan and 1kyu is {disparity} larger than OGS does")

Links
Formula of ln(rating / 525) * 23.15 for converting OGS ranks to ratings from here
Explanation of AGA win probabilities between ranks from here
Basic explanation of how rating difference relates to win probabilities can be found here

_KoBa · August 21, 2021, 2:40pm

No idea, AGA ranks are weird >___>
1.60 wins for every loss does sound (by pure gut-feeling) that stronger player taking w and 0.5 komi would be recently appropriate handicap between them.

I’ve seen players with the same AGA rank having huge differences in skill, for me its a mystery how can 1k be multiple stones stronger than another 1k >___>

jlt · August 21, 2021, 2:48pm

The indicated numbers are theoretical winning probabilities. Are there actual statistics on AGA-rated games?

Groin · August 21, 2021, 2:55pm

Isn’t this difference due to the frequency of playing a game in each system?

jlt · August 21, 2021, 3:21pm

Statistics on EGF rated games since 2006.

gennan · August 21, 2021, 3:49pm

So the AGA rating system predicts 83% winning probability for the 1d, while the OGS rating system predicts 61% winning probability.

The EGF historical data supports the OGS prediction better than the AGA prediction (as @jlt shows, about 60%).

I don’t know why the AGA chose for such a high prediction.

DVbS78rkR7NVe · August 21, 2021, 4:56pm

What about other ranks?

Uberdude · August 21, 2021, 7:35pm

Given the weakness of AGA dan ranks relative to EGF ones I’d actually expect them to be closer together so have smaller expected win probability between ranks.

jlt · August 21, 2021, 7:43pm

Why? If x dan EGF = (x+2) dan AGA (*) then the ranks are just shifted, and are not closer together.

(*) That formula is speculative, I don’t know the exact equivalence between the two systems.

gennan · August 21, 2021, 8:17pm

According to this site (caption Elo per stone), the AGA rating system uses about 270 Elo per rank (or stone).

In April 2021 the EGF changed from the green curve to something close to the blue curve. (the thin lines are derived from observed winrates similar to the table that @jlt posted).

So in terms of Elo, the AGA ranks are wider than EGF ranks (and OGS ranks). But the rating reset policies also play an important role. I think the AGA uses the policy that if you declare a higher rank in a tournament and win at least one game, you rating gets reset to your declared rank. So that would be a much easier way to rank up than waiting for your rating to gradually move up by tournament wins only.

The EGF also has a rating reset policy, but it is used quite conservatively in practice.

Uberdude · August 21, 2021, 8:50pm

Because I think an AGA 10k is closer to a EGF 10k than an AGA 5d is to a EGF 5d so it’s not just a shift but a compression in the low kyus and dans. This is just my impression from interacting with people of various ranks in both systems rather than large scale statistical analysis though, so I could be wrong.

gennan · August 21, 2021, 9:20pm

I also think there is a slight compression in AGA ranks compared to EGF ranks for strong kyu and dan players. But I don’t think the statistics of the AGA would be very different from the EGF and OGS statistics. So because the Elo per stone used by the AGA is greater below 7d EGF, the opposite of compression would be expected.

I think the explanation is that rating resets are used more liberally in the AGA than in the EGF. That is more a cultural thing than having to do with the details of the rating system. The AGA does not allow declaring a rank below your rating, but declaring a rank above your rating is fine. The EGF seems to have more of a gate keeping culture, being conservative with promotions and often not allowing declaring a rank above your rating, especially for dan ranks.

So the AGA rating reset policy usage seems to aim at fighting sandbagging, while the EGF rating reset policy usage seems to aim at fighting airbagging. I think those cultural differences are sufficient to explain the 1-2 rank gap between AGA and EGF dan ranks.

John_C · August 21, 2021, 9:22pm

Thank you for your reply. I personally feel that that 60% win probability is not correct.

The problem is we want to know what is the winrate of a 1 dan vs a 1 kyu who are exactly one rank apart (a 1 dan could be anywhere from 0.001 to 1.999 ranks stronger than a 1kyu). What that 60% is telling us is the winrate of a player with an EGF rank of 1dan is against a player with an EGF rank of 1kyu.

Now one issue with the second statement is that because some of these players are not exactly one rank apart, and may even just be a small fraction of a rank apart, that might cause the winrate to be a bit smaller than the exact one rank difference. This effect is likely small though.

However, the really big problem with the second statement is that a player’s EGF rank could easily be inaccurate. There are tons of players who haven’t played a ranked tournament in a year or more and are vastly improved on their old rank, or have gotten rusty. That means a lot of those 40% of the times the player with a 1 kyu rank won against the player with a 1 dan rank, there could have been an upcoming player vs a rusty player.

I think this effect is shown more clearly when looking at the four rank differences. Apparently, a 4kyu has a 6.5% chance to win against a 1dan. To me this seems totally off. If I could give someone a 4 stone handicap and reverse komi and still have a 50% chance to beat them, I would say their chance to beat me in an even game is only like 1%, not 6.5%.

Go tournaments are structured in a way that if you win many games, you play stronger ranked players, versus if you lose many games you play weaker ranked players. This means that if the 1dan vs 4kyu match occurs in a real tournament, it is almost done in a way that selects for the upcoming players to play against the rusty players.

This is a big problem that I feel holds the EGF rating system back. However, the AGA system has I think found a reasonably good solution in using rating uncertainties and formalising the process of self-promotion. Players who are inactive or who have self-promoted have large uncertainties, and their ratings are designed to fluctuate more while their opponents’ ratings will not be as effected. This minimises the problem of players with inaccurate ranks messing up statistics. I think rating uncertainties are the big reason why AGA win probabilities are higher than EGF ones.

The AGA rating document I linked also talks about how this, in how predicting a player’s rank has contributions from both their game results and the rating uncertainty probabilities of their opponents in section 3. I’d recommend reading section 4 of that report “Other Details”, for more info on how the AGA deals with out-of-date ratings and self-promotions.

John_C · August 21, 2021, 9:28pm

Those numbers are theoretical. However, the actual results in practice would have to align with the theory. For example, if the strong players were unable to win 83% of their games against players one rank lower, the AGA would experience huge rank demotions at the higher ranks and huge rank promotions at lower ranks.

gennan · August 21, 2021, 9:28pm

I did a lot of analysis of EGF data and I found that the 60-65% still holds between players whose EGF rating is 100 GoR (1 full rank) apart around 1d. So that would be an average 1d against an average 1k, not a rusty 1d vs an upcoming 1k (although some of those could be included in the statistics).

Also, I (EGF 3d) have won quite a few games against EGF 6d and lost quite a few games against EGF 1k.

John_C · August 21, 2021, 9:44pm

I think you misunderstood what I was saying about an upcoming player. Let’s say there’s a 4kyu with an EGF rating of 1651. This player is one point away from demotion to 5kyu and 99 points away from promotion to 3kyu.

However, this player’s last tournament was 12 months ago, and they are greatly improved since then. They play as well as a 1kyu. This is what I mean when talking about an ‘upcoming player’.

Although it would be interesting if you won/lost against 6dan/1kyu’s who were reasonably active before you played them, and thus their rating uncertainty was low. Them being active with stable ratings means we can rule out the possibility of them being upcoming/rusty.

If it was shown that an active 1kyu with stable rank could win 40% of the time against an active 1dan with stable rank that would undermine my post.

gennan · August 21, 2021, 9:51pm

Selecting any particular combination of a 1d vs a 1k player would just be anecdotal data, so I think the only meaningful data has to be statistics from a large sample.

But you might want to check my game history as one data point (anecdotal data): Dave_de Vos | Player card | E.G.D. - European Go Database

By selecting “opponents” and sorting by opponent rank, you can verify my percentages against 6d and against 1k.

John_C · August 21, 2021, 9:55pm

Interesting question. For the case of a 1.00dan vs a 4.00kyu, exactly a 4 rank difference:

OGS thinks the 4kyu will win 14.8% of the time
AGA thinks the 4kyu will win 0.008% of the time

Obviously a huge difference here. A four rank difference means the stronger player can give a four stone handicap and reverse komi (people often forget about the reverse komi) and still win 50% of the time. In my opinion based on ‘gut-feeling’, if these players played an even game, I would expect the weaker player to win maybe 1% of the time.

14.8% seems way too high but 0.008% just seems too high a consistency from the stronger player

BHydden · August 21, 2021, 10:05pm

It’s no komi, not reverse komi.

gennan · August 21, 2021, 10:05pm

The updated EGF system predicts about 11.8% for the 4k in such a match. The historical data of the EGF since 2006 observed 9.6% (but probably not exactly 1.00 dan vs 4.00 kyu).

But either way, data for even games with 4 ranks difference around 1d is sparse in the EGF history. Such games don’t happen a lot in MacMahon tournaments. I think I have only played a handful of such games in my almost 700 tournament game history (vs 7d+ or 2k-).