Strength of OGS players

I used the height metric because it’s a much more obvious example of different metrics that you can’t compare by looking just at the numbers without converting them.

Ranks across servers are the same — we just don’t know the conversion rates, so it’s not as obvious. Still, we shouldn’t think because a 1k is stronger on OGS than a 1k on KGS, that this means anything about the relative strength of the players on different servers. It just means the numbers aren’t directly comparable without conversion.

This is exactly how the myth that Napoleon was really short got started… French feet were longer than American feet and so comparing the numbers directly made it sound like he was shorter than he actually was, which for the time period he was considerably average in height.

1 Like

What could you mean by “ranks across the servers are the same” if you need to convert between them?

Ranks across the servers are different … and we don’t know the conversion rates.

A KGS 1d is different in skill to an OGS 1d: that’s the observation.

2 Likes

We all trailed off from the topic. If OP’s forum nick matches OGS one then we can look at the game history. We have ~7 ranked games against not bots and not weird. And the games look ok. Maybe you @wilhelm4 simply have not adjusted yet. When I changed servers first games on new servers usually weren’t good.

I don’t think discussion about how ranks are bad or weird is warranted here.

2 Likes

Everybody here is concerned about the size of their… rank! :laughing:
And the issue is exactly that: “How could it be that I am 3’ on one server and 5’ on another one?”
All servers use same scale: kyus and dans. But the values are different because the scales are not comparable. That’s a nonsense.
The case looks just as BHydden said about Napoleon: same names but different metrics. That’s confusing.
Only reasonable answer seems to be: don’t stress about your rank, just play and have fun! :wink:

5 Likes

The purpose of a rank is not to give you a sense of identity, but rather to aid you in finding a roughly even game in a given environment. I think OGS does that well.
If you adamantly hold to being a 3k, but lose most of your even games, then maybe your “rank” is more pride than function.
If you adamantly hold to being 10k, because there’s just no way you’re strong enough to be SDK yet, but yet you win almost all your even games, then maybe your “rank” is more fear than function.
I think this thread may have a lot of behind the scenes overlap with The mind game - opponent's rank?

Though outdated, the message of this graph still rings true… attempting to compare ratings between unrelated pools is folly.

10 Likes

I think OGS way of combining 9x9 and 19x19 ranks is a little bit wrong. On 9x9 the wins and losses have significantly more noise than 19x19 yet the numbers are combined just like that. A win or loss streak on 9x9 seems to make the following 19x19 games be unevenly matched.

3 Likes

In the past we had unique ranks for all combinations of board size and time control, but we got complaints about that too. Being strong in one and then playing in a different category was perceived as a form of sandbagging.

2 Likes

The OP’s question is directly related to OGS ranks and ranks being weird is the explanation. I don’t see how it is derailing from the topic.

Checking OP’s games is not enough to see why he had hard games - I dont think it’s been claimed him playing bots is the reason for his rank. As I mentioned, it is him running against certain types of opponents (people who play bots, with curvy graphs, blitz throwers -or even people who play those people). There is just a big gap between the strength of blitzers and corr people who dont use support. A 4kyu 3 sec blitzer roughly equals to a 10kyu who plays mostly corr and doesn’t use support. That is not providing even matchups, not consistently anyway. And if you don’t know how to avoid the smooth sandbagging crowd, you will be lower than your actual rank.

OGS allows for cowards to lower their rank and proud people to airbag, as easily as they can. Sure, joining them if you can’t beat them is an easy option but I personally like that some of the many people, who notice the weirdness in ranks, are speaking out.

Well, look at OP’s opponents, tell us what you think about them.

Haha, you’re giving me hope here.

The root cause of the difference in ranks for the same player on different servers is due to the result-based ranking system. Flatten ranks occur when you are playing on the server where there are a lot of stronger opponents. This problem can be solved by using a new ranking system called ‘performance’ (%). You performance is 100% if you were as strong as AlphaGo Zero (AGZ). It is 50% if your performance were a half of AGZ’s performance. By this move quality-based system, you can compare the performance of Go players across the countries or tournaments, or throughout the history of Go.

1 Like

Who is this AlphaGo Zero and why can he still play when torn in half?


More seriously, AlphaGo Zero was never released, so we cannot let users play against it. Also how do I determine the relative performance?

4 Likes

SPA requires a paradigm shift in Go ranking. Although AlphaGo Zero (AGZ) has never been released, the game records of AGZ are widely available. Using SPA, you can assess the quality of the moves played by AGZ and calculate the AGZ’s performance (p-value) based on the strength of AGZ’s moves. Then, take the p-value for AGZ as 100%; and determine your performance in the best game you played. SPA refers to ‘Superhuman Performance Analysis’, a new method for Go ranking and determining a person who is actually the strongest player in the history of Go (He is Cho Chikun. Neither Shusaku nor Dosaku is the strongest human player in the history of Go)

4 Questions

  1. is there a paper about SPA
  2. if I understand you right, only 1 of my games is used to calculate my strength
  3. if I understand you right, this is a test about how similar my playing style is to AlphaGo Zero
  4. do you mean this p-value? https://en.m.wikipedia.org/wiki/P-value
4 Likes
  1. There is a book, not a paper.
  2. One best game is more than enough, because SPA deals with the moves in the peak game, not the game results. If you want to access the current performance, use the last game you played.
  3. No. Not at all. It is a test about how strong your moves are. You can play your style and get the same performance as AGZ if your moves and AGZ’s moves are equally strong. You can get greater performance value if you played differently but better than AGZ.
  4. No. I mean p-value in a new domain, not in statistical one. Here p stands for performance (%), not probability (Dmnl).

I’m not convinced.

The best™ game is an statistical outlier and not representative for the players strength. Additional I don’t see how using the moves of one game, but not comparing them with a reference (AGZ) leads to any estimation.

As an additional thought, we want to compare the strength ofdifferent AIs as well. What happens with the performance value when we feed the algorithm with a game of a player stronger than the reference player (AGZ).

4 Likes

I don’t want to convince you either. You need to understand the fundamental philosophy of the SPA before being able to think about it. I gained many insights from SPA. For example, Shusaku Fuseki is almost perfect and not yet perfect. Controlled sample games had plausible p-values. For example, the p-value for AGZ is a little greater than the p-value for AGM and far greater than the p-value for Michael Redmond.

I’m struggling with how to perform superhuman performance analysis. In your self-authored 37-page book, superhuman performance analysis "can be conducted in five steps:

  1. Top Professionals and Benchmarks Selection
  2. Performance Analysis
  3. Strongest Go Players in History Listing, and
  4. Multiple Tests for Building Confidence." (these are four steps, but I digress)

Since choosing the players (step 1), listing the results (step 3), and repeating the tests (step 4) are trivial, I’m interested in step 2. But the free sample of the book ends one page before you describe the process.

Your book summary is more helpful: “Two simple equations of Go skill, in terms of performance (%) and move strength, were created and the well-validated superhuman Go bot was employed to assess the top professionals’ performance.” Have you got any more details than this? What superhuman Go bot do you use? How many visits do you give it? How do you calculate the performance and move strength variables? Why is AlphaGo Zero the benchmark and not the superhuman Go bot itself? Without seeing the details that I assume are on page 11 (and only page 11) of your book, your analysis appears vulnerable to @flovo’s criticism that this is a test about how similar the players’ playing styles are to the superhuman Go bot.

P.S.: The ELF OpenGo team also used AI to rate professional Go players. But they were much more cautious in their conclusions and did not make sweeping conclusions like “Player 1 is stronger than Player 2.”

P.S.S.: You know what, nevermind. I see that you’ve decided to conceal your process. But unless the process is known, it’s of no use as a ranking system.

11 Likes

I will just pop in to say that the people on Sensei’s Library have a thread going right now about this, and they are very sceptical. Also, thank you Mark for taking a sharp knife to this.

4 Likes

It’s a shame that the go community has to waste time being distracted by this ridiculous pseudoscientific nonsense. Perhaps this is all just some elaborate trolling?

Besides, I think there are far better ways to measure the relative strengths between different servers and communities of players:

  1. Thorough and plentiful polling.
  2. Quality and volume of memes generated.
  3. Proliferation of games involving werewolves, vampires and raptors.
  4. The general degree of tomfoolery in their public discourse.
13 Likes