On OGS vs IGS rankings

This would only be true if the there would be no fixed or meaningfully recognizable points on an infinite/endless ranking scale.

In Go, since perfect play is a fixed, few stones distance from current bots, if you specify ranks as X stones to perfect play (like in Asia, I sometimes see “X stones to pro level”) you can have ranks that are “real” in some sense.

Shouldn’t it work similarly when counting form “total beginners”?

The fact that beginners have different ranks on different servers doesn’t help…

I think beginner level is too inconsistent for this, not an accurate point like perfect play. Beginner level may even drift in time (with cultural changes in decades, etc).

1 Like

I’d say “perfect play” also drifts with time. It’s just that we are so close from the first instances of perfect play that it’s hard to tell.

I don’t understand this. If it could drift even by 0.01 points, if was not perfect to begin with. :slight_smile:

1 Like

True, but perfect play cannot actually be determined so it cannot be used to define ranks.

We could say “X stones from Katago” though.

On the other side, I agree “beginner” is too inconsistent but we could use “random play” as a basis

1 Like

Do we have some conversion of OGS ranks to X stones from a particular Katago bot? Does anybody have the data?

1 Like

I mean for a specific instance of katago one could do something like

https://online-go.com/api/v1/players/902691/games?handicap__gt=0&white_lost=true

to filter games where White lost and the handicap was >0, with the assumption that Katago will be white. One might have to grab them page by page though I’m not sure.

On the other hand…

One could ask here for someone to do this with the big data download

Yes, you can substract your rank from Katago’s rank. :slight_smile:

This oc implies a bot that mostly plays handi games and a rating system which correctly handles those games.

IIRC you don’t need an actual perfect player to be able to estimate your distance to it. This is because as strength increases, so does consistency as well. So you cannot directly measure the strength of a player (only compare him to others), but can still get an idea of the consistency of his play.

I can’t prove it ofcourse, but I don’t think a perfect player can give KataGo running at 10 million playouts/move 9 stones handicap. My gut feeling is that the proper handicap would be close to 2 stones.

2 Likes

I’d be happy to help, but I don’t understand what I should do. :slightly_smiling_face:

Sorry, I may be dumb but I still don’t get how you would estimate your distance from a perfect player without knowing what perfect play is. How can you evaluate someone being “two stones from perfect play” ?

I agree with @gennan that you could go by a gut feeling that KataGo is somewhere around 2 stones from perfect play, but that’s just a guess.

1 Like

By measuring consistency (instead of strength) and its changes.

For example, you can compare how consistent is the play of 1d, 3d and 5d players, and how fast that consistency changes in this range. If it changes, say, 10% between 1d and 5d, this means more distance to perfect play than if it changes 20% (just random numbers for example).

1 Like

How do you define “consistency” here ? How is it measured ?

It should probably be inconsistency, and not easy to quantify. One measure could be how a player performs vs 1 stone weaker opponents (the lower his inconsistency the higher his winrate). Or ask a strong bot.

I don’t think you can use consistency as synonym with strength though.

Even assuming they are indeed statistically correlated in a meaningful way, knowing the consistency of a given player will not be enough to deduce its strength. You can make a guess based on the general trends you observe (People with this level of consistency tend to usually be around this level) but an individual will always deviate more or less significantly from the statistical norm.

Also, since again the notion of perfect play is purely theoretical, I don’t see how you can measure the gap in strength between, say, Katago and perfection. You could measure a gap in consistency but how much stronger in stones would you need to be to bridge this gap?

In my view a ranking built on Go consistency would be just that, but cannot be used as equivalent with a ranking built on Go strength.

1 Like

One question that is interesting to ask: does Katago with 2 handicap stones sometimes loses against itself? If the answer is yes, then Katago is more than 2 stones away from perfect play.

3 Likes

Basically, what is the win-rate vs player strength at a specific handicap for playing against for example katago-micro.
I guess one would need to filter out some games with premature resignations or maybe filter for ranked games.

So we could deduce what handicap does player with rank x needs for an even game. And see if it is indeed (9 - x).

You could define consistency at different levels of play by the Elo width of ranks (where ranks are assumed to be determined by handicap, as in n ranks difference can be compensated for by black getting to play n moves before the game starts with black’s turn and white getting komi, up to about n = 15).

Determined from EGF historical winning statistics, ranks around 15k are about 50 Elo wide, ranks around 1d EGF are about 100 Elo wide and ranks around 7d EGF are about 250 Elo wide. Going into the pro range, you get ranks (not pro ranks, but handicap ranks as defined above) of about 300+ Elo wide.

At perfect play, the Elo width per rank would approach infinity (or perhaps a finite, but very large value, due to the fact that score is an integer value instead of a continuous value, so komi handicap increments of less than 0.5 points are meaningless).

So you can fit an asymptotic curve through those Elo widths derived from winning statistics at different levels of play, to get an estimate for the highest rank possible = perfect play.

Using this method on EGF historical data, I arrived at an estimate of 13d EGF for perfect play, from the blue curve in this Elo width per rank graph, which is used by the EGF rating system since April 4th 2021 (red curve is what OGS uses):

image
Vertical axis is the Elo width per rank.
Horizontal axis is EGF rank expressed like internal OGS rank scale:

  • 0 = 30k
  • 10 = 20k
  • 20 = 10k
  • 30 = 1d
  • 39 = 10d = more or less highest level achieved by humans like Go Seigen and Shin Jinseo,
  • 42 = 13d = max rank in EGF rating system ~ perfect play?

(2021 Rating and rank adjustments - #59 by gennan)

1 Like

I looked for katago-micro in the 27M games dump.

On 19x19, we have only 86132 finished games.
51 of them are ranked and they are all against other bots.

opponent games
12bTurboSai 17
20bTurboLz-Elf-v1 2
60b Katago 1 playout 3
doge_bot_2 2
IntuitiveSAI 12
Spectral-1k 14
Spectral-4k 1
Total 51

Katago-micro’s rank is mostly 38 (9d): 59038 games out of 86132 .

Out of 86081 unranked games, 21709 are even games.
katago micro won 20522 of them.
Strongest opponents are:

  • hjkl123, won 207 games out of 227 (91.6 %)
  • kkxxcake, won 86 games out of 409 (21%)

Out of 64372 handicap games here is distribution of handicaps:

Here is breakdown of opponent’s rank, which spans from -8 (25k) to 44 (9d+)

I’m starting to feel disoriented. :smiley:

What’s next step?

2 Likes