I remade the histogram for new ranks as a good tradition.
Percents on the bottom are basically percentiles. They’re calculated from green part of the players as “everyone weaker than the given rank” plus “half of the players in the given rank” and rounded to nearest. Thus, it shows approximate percentile of a median player in the given rank.
Good point. “Kyu” and “dan” are not well behaved (linear) units.
Since there are a huge number of factors that affect playing strength, the Central Limit Theorem will work to make the distribution more “normalish”. But because becoming a “strong” player requires a person fall on the “stronger” parts of the distributions of a majority of the input factors (time, effort, motivation, enjoyment, quality of teachers, strength of opponents, books read, problems solved, games played, age, etc), and people’s resources are limited, It is reasonable to assume the distribution of “player strength” will be skewed towards lower ranks. The histogram supports that assumption (heavy tail on lower ranking side).
I would be interested to know if there is a way you could test for normality in player strength even with a non-linear ratings scale; I don’t know of any, but I’m not a Statistician either.
Oh wow! Yeah! If you assume a normal distribution and re-scale the x-axis accordingly, then you might find out something interesting about the expected rate of improvement at different levels, i.e. the usual law of diminishing returns. I wonder if that would even reveal ranks at which people commonly plateau…
That sounds like reverse application of statistics though… Assuming it’s normally distributed, and trying to make conclusions from that. Sounds dangerous
Both beginners and experienced players are on this histogram. I wonder on which rank most players stuck? If you will plot that histogram for players who have at least 2000 games, it will be seen.
I totally agree, but I’d like to point out (even though I’m not an expert in statistics) that there may be a way to (sort of) justify such a reverse application.
There is a reason why the observed distribution of measured data is very often close to a standard distribution. I’m talking about the Central Limit Theorem. The basic idea is that a complex random variable (like IQ or go playing strength, etc.) is dependant on many different factors, each contributing a little bit. The distributions of the smaller factors may look differently, but when properly normalized (and given certain assumed properties of the smaller parts, such as independency), then the distribution of the result resembles a normal distribution.
Random shower thought: I wonder if we can get somebody do to the same for other servers? I do realise it would be a mistake to try and combine the data - as some players may have accounts on both servers, but still