Unofficial OGS rank histogram 2021

I remade the histogram for new ranks as a good tradition.

Percents on the bottom are basically percentiles. They’re calculated from green part of the players as “everyone weaker than the given rank” plus “half of the players in the given rank” and rounded to nearest. Thus, it shows approximate percentile of a median player in the given rank.

Accurate percentiles
  0: 68.4k
  1: 29.6k
  2: 26.9k
  3: 25.1k
  4: 24.0k
  5: 22.9k
  6: 22.1k
  7: 21.3k
  8: 20.7k
  9: 20.0k
 10: 19.4k
 11: 18.8k
 12: 18.3k
 13: 17.8k
 14: 17.3k
 15: 16.9k
 16: 16.4k
 17: 16.0k
 18: 15.6k
 19: 15.2k
 20: 14.9k
 21: 14.5k
 22: 14.2k
 23: 13.9k
 24: 13.7k
 25: 13.4k
 26: 13.1k
 27: 12.9k
 28: 12.6k
 29: 12.4k
 30: 12.1k
 31: 11.9k
 32: 11.7k
 33: 11.5k
 34: 11.3k
 35: 11.1k
 36: 10.9k
 37: 10.7k
 38: 10.5k
 39: 10.3k
 40: 10.1k
 41: 9.9k
 42: 9.7k
 43: 9.5k
 44: 9.3k
 45: 9.1k
 46: 9.0k
 47: 8.8k
 48: 8.6k
 49: 8.4k
 50: 8.2k
 51: 8.0k
 52: 7.9k
 53: 7.7k
 54: 7.5k
 55: 7.3k
 56: 7.2k
 57: 7.0k
 58: 6.8k
 59: 6.6k
 60: 6.5k
 61: 6.3k
 62: 6.1k
 63: 5.9k
 64: 5.8k
 65: 5.6k
 66: 5.4k
 67: 5.2k
 68: 5.1k
 69: 4.9k
 70: 4.7k
 71: 4.5k
 72: 4.4k
 73: 4.2k
 74: 4.0k
 75: 3.8k
 76: 3.6k
 77: 3.4k
 78: 3.3k
 79: 3.1k
 80: 2.9k
 81: 2.7k
 82: 2.4k
 83: 2.2k
 84: 2.0k
 85: 1.8k
 86: 1.5k
 87: 1.3k
 88: 1.1k
 89: 0.8k
 90: 0.6k
 91: 0.3k
 92: 1.1d
 93: 1.4d
 94: 1.8d
 95: 2.2d
 96: 2.6d
 97: 3.2d
 98: 3.9d
 99: 4.9d
100: 15.0d
29 Likes

That’s not normal (statistics humor).

17 Likes

correct, its standard is deviated.
Just my feeble attempt at some more statistics humor :grin:

6 Likes

Cool graph!
May I ask where did you get the data from?

But who claimed that the horizontal axis is linear?

7 Likes

Good point. “Kyu” and “dan” are not well behaved (linear) units.

Since there are a huge number of factors that affect playing strength, the Central Limit Theorem will work to make the distribution more “normalish”. But because becoming a “strong” player requires a person fall on the “stronger” parts of the distributions of a majority of the input factors (time, effort, motivation, enjoyment, quality of teachers, strength of opponents, books read, problems solved, games played, age, etc), and people’s resources are limited, It is reasonable to assume the distribution of “player strength” will be skewed towards lower ranks. The histogram supports that assumption (heavy tail on lower ranking side).

I would be interested to know if there is a way you could test for normality in player strength even with a non-linear ratings scale; I don’t know of any, but I’m not a Statistician either.

Oh wow! Yeah! If you assume a normal distribution and re-scale the x-axis accordingly, then you might find out something interesting about the expected rate of improvement at different levels, i.e. the usual law of diminishing returns. I wonder if that would even reveal ranks at which people commonly plateau…

5 Likes

That sounds like reverse application of statistics though… Assuming it’s normally distributed, and trying to make conclusions from that. Sounds dangerous

2 Likes

What does the shape of the curve in raw Glicko rating look like?

What’s the conversion between that and kyu/dan?

3 Likes

Both beginners and experienced players are on this histogram. I wonder on which rank most players stuck? If you will plot that histogram for players who have at least 2000 games, it will be seen.

I got the data from… OGS.


Reminder: percentiles on the bottom are calculated from green part of the players as “everyone weaker than the given rank” plus “half of the players in the given rank” and rounded to nearest. Thus, it shows approximate percentile of a median player in the given rank.

Is this way of calculating alright with everyone?


Let’s add this clarification to the top post.

4 Likes

I totally agree, but I’d like to point out (even though I’m not an expert in statistics) that there may be a way to (sort of) justify such a reverse application.

There is a reason why the observed distribution of measured data is very often close to a standard distribution. I’m talking about the Central Limit Theorem. The basic idea is that a complex random variable (like IQ or go playing strength, etc.) is dependant on many different factors, each contributing a little bit. The distributions of the smaller factors may look differently, but when properly normalized (and given certain assumed properties of the smaller parts, such as independency), then the distribution of the result resembles a normal distribution.

2 Likes

scrapping the web in search of every player’s rank? or is there another way to access the data?

We have an api

2 Likes

@yebellz raw ratings still look ugly

@stone_defender after 2000 games go like this

9 Likes

Looks a lot closer to normal, though, that first one.

Or perhaps a logistic distribution? The tails look a bit heavy for a normal distribution.

Thank you for assembling the data and getting it into a nice chart! Appreciated.

2 Likes

:scream: awesome!! thanks!!!
(I knew there was a api, but I thought it was intended for other things… when I get some free time I’ll check it out)
Thanks!!!

Highest rated account is Handibot[9d+], by the way. Looks like it got the rank from beating Sai repeatedly.

1 Like