Unofficial OGS rank histogram 2021

DVbS78rkR7NVe · April 13, 2021, 3:35am

[Outdated]

I remade the histogram for new ranks as a good tradition.

Percents on the bottom are basically percentiles. They’re calculated from green part of the players as “everyone weaker than the given rank” plus “half of the players in the given rank” and rounded to nearest. Thus, it shows approximate percentile of a median player in the given rank.

Accurate percentiles

  0: 68.4k
  1: 29.6k
  2: 26.9k
  3: 25.1k
  4: 24.0k
  5: 22.9k
  6: 22.1k
  7: 21.3k
  8: 20.7k
  9: 20.0k
 10: 19.4k
 11: 18.8k
 12: 18.3k
 13: 17.8k
 14: 17.3k
 15: 16.9k
 16: 16.4k
 17: 16.0k
 18: 15.6k
 19: 15.2k
 20: 14.9k
 21: 14.5k
 22: 14.2k
 23: 13.9k
 24: 13.7k
 25: 13.4k
 26: 13.1k
 27: 12.9k
 28: 12.6k
 29: 12.4k
 30: 12.1k
 31: 11.9k
 32: 11.7k
 33: 11.5k
 34: 11.3k
 35: 11.1k
 36: 10.9k
 37: 10.7k
 38: 10.5k
 39: 10.3k
 40: 10.1k
 41: 9.9k
 42: 9.7k
 43: 9.5k
 44: 9.3k
 45: 9.1k
 46: 9.0k
 47: 8.8k
 48: 8.6k
 49: 8.4k
 50: 8.2k
 51: 8.0k
 52: 7.9k
 53: 7.7k
 54: 7.5k
 55: 7.3k
 56: 7.2k
 57: 7.0k
 58: 6.8k
 59: 6.6k
 60: 6.5k
 61: 6.3k
 62: 6.1k
 63: 5.9k
 64: 5.8k
 65: 5.6k
 66: 5.4k
 67: 5.2k
 68: 5.1k
 69: 4.9k
 70: 4.7k
 71: 4.5k
 72: 4.4k
 73: 4.2k
 74: 4.0k
 75: 3.8k
 76: 3.6k
 77: 3.4k
 78: 3.3k
 79: 3.1k
 80: 2.9k
 81: 2.7k
 82: 2.4k
 83: 2.2k
 84: 2.0k
 85: 1.8k
 86: 1.5k
 87: 1.3k
 88: 1.1k
 89: 0.8k
 90: 0.6k
 91: 0.3k
 92: 1.1d
 93: 1.4d
 94: 1.8d
 95: 2.2d
 96: 2.6d
 97: 3.2d
 98: 3.9d
 99: 4.9d
100: 15.0d

MrEntropy · April 13, 2021, 6:13am

That’s not normal (statistics humor).

Atorrante · April 13, 2021, 6:36am

correct, its standard is deviated.
Just my feeble attempt at some more statistics humor

n0w3l · April 13, 2021, 12:43pm

Cool graph!
May I ask where did you get the data from?

Vsotvep · April 13, 2021, 12:51pm

But who claimed that the horizontal axis is linear?

MrEntropy · April 13, 2021, 8:12pm

Good point. “Kyu” and “dan” are not well behaved (linear) units.

Since there are a huge number of factors that affect playing strength, the Central Limit Theorem will work to make the distribution more “normalish”. But because becoming a “strong” player requires a person fall on the “stronger” parts of the distributions of a majority of the input factors (time, effort, motivation, enjoyment, quality of teachers, strength of opponents, books read, problems solved, games played, age, etc), and people’s resources are limited, It is reasonable to assume the distribution of “player strength” will be skewed towards lower ranks. The histogram supports that assumption (heavy tail on lower ranking side).

I would be interested to know if there is a way you could test for normality in player strength even with a non-linear ratings scale; I don’t know of any, but I’m not a Statistician either.

dragon-devourer · April 13, 2021, 8:15pm

Oh wow! Yeah! If you assume a normal distribution and re-scale the x-axis accordingly, then you might find out something interesting about the expected rate of improvement at different levels, i.e. the usual law of diminishing returns. I wonder if that would even reveal ranks at which people commonly plateau…

Vsotvep · April 13, 2021, 8:17pm

That sounds like reverse application of statistics though… Assuming it’s normally distributed, and trying to make conclusions from that. Sounds dangerous

yebellz · April 13, 2021, 8:50pm

What does the shape of the curve in raw Glicko rating look like?

What’s the conversion between that and kyu/dan?

square.defender · April 13, 2021, 8:59pm

Both beginners and experienced players are on this histogram. I wonder on which rank most players stuck? If you will plot that histogram for players who have at least 2000 games, it will be seen.

martin3141 · April 14, 2021, 2:02pm

I totally agree, but I’d like to point out (even though I’m not an expert in statistics) that there may be a way to (sort of) justify such a reverse application.

There is a reason why the observed distribution of measured data is very often close to a standard distribution. I’m talking about the Central Limit Theorem. The basic idea is that a complex random variable (like IQ or go playing strength, etc.) is dependant on many different factors, each contributing a little bit. The distributions of the smaller factors may look differently, but when properly normalized (and given certain assumed properties of the smaller parts, such as independency), then the distribution of the result resembles a normal distribution.

n0w3l · April 14, 2021, 4:35pm

scrapping the web in search of every player’s rank? or is there another way to access the data?

BHydden · April 14, 2021, 5:15pm

We have an api

Vsotvep · April 14, 2021, 10:44pm

Looks a lot closer to normal, though, that first one.

gennan · April 15, 2021, 6:04am

Or perhaps a logistic distribution? The tails look a bit heavy for a normal distribution.

Nickj · April 15, 2021, 10:21am

Thank you for assembling the data and getting it into a nice chart! Appreciated.

n0w3l · April 15, 2021, 2:23pm

awesome!! thanks!!!
(I knew there was a api, but I thought it was intended for other things… when I get some free time I’ll check it out)
Thanks!!!

DVbS78rkR7NVe · April 18, 2021, 9:39am

Highest rated account is Handibot[9d+], by the way. Looks like it got the rank from beating Sai repeatedly.

ronin3 · April 18, 2021, 10:02am

Random shower thought: I wonder if we can get somebody do to the same for other servers? I do realise it would be a mistake to try and combine the data - as some players may have accounts on both servers, but still

DVbS78rkR7NVe · April 18, 2021, 4:41pm

It’s a natural thought but whoever knows API for all those other servers.