Show Kyu/Dan instead of Glicko Rating on Player Profile

Thank you @GreenAsJade for the explanation. I apologize to you and anyone else who reads this for myself not understanding the question:grinning: If I understand @GreenAsJade properly now there are just more number sections; (19 x 19, 13 x 13, 9 x 9) that have been split into ELO rating sections. I do understand the fact that the kyu/dan ranks are under the ELO numbers (it is pretty obvious), but I think that they just need to be hidden and continued to be used for the kyu/dan system which should be visible to other users. I feel that the sections they have under your profile should just be changed into kyu/dan sections. Again, I am sincerely sorry for confusing this more and anyone is more than welcome to correct me again if I still don’t understand this properly.

Thanks from The “GoBoard” (@_GoBoard)

Well it allows you to actually compare something! That’s a start in the right direction. This is important because comparison is the key point here. Comparison to what? Well, there are two things we’d like to compare:

  • Comparing your rating to other’s, i.e. your performance relative to others within a rating pool.
  • Comparing your rating to your other ratings, i.e. your relative strengths and weaknesses.

A rank doesn’t actually mean anything. If I told you that I was 4 elephants at 9x9 and 10 mongoose at 19x9 our minds immediately jump to the ridiculous comparison of elephants and mongoose. We actually try to start figuring out how many mongoose go into one elephant anyway?

My point is that people are always going to tend towards comparing their rating to their own ratings. That in mind, the problem we want to solve is how to compare elephants to mongoose - well, we can do that to a certain extent using percentiles:

A nice property of percentiles is they have a universal interpretation: Being at the 95th percentile means the same thing no matter if you are looking at exam scores or weights of packages sent through the postal service; the 95th percentile always means 95% of the other values lie below yours, and 5% lie above it. This also allows you to fairly compare two data sets that have different means and standard deviations (like ACT scores in reading versus math). It evens the playing field and gives you a way to compare apples to oranges, so to speak.

From statistics for Dummies, Second Edition

This reply was better than 68% of replies.

I think that comparing percentiles across different bins could also be misleading.

For the comparison to be meaningful, we have to assume that the population within each bin is similarly distributed in strength. However, I think it’s quite possible that we might have skewed strengths across various bins. Perhaps, there are more beginners focusing on 9x9, while more stronger players prefer 19x19. Maybe the population of live players is largely independent of and at a different level of skill than the correspondence players. Of course, these are just speculative examples for the sake of making a point, and they may well be false.

So, the perception of being “stronger than a larger proportion of 9x9 players than 19x19 players” (based on percentiles), might just be reflective of the skewed strength across the groups of players in different bins, rather than the individual player in question.

Edit: added below text for further clarification

I agree that percentiles can objectively determine that a player is “stronger than a larger proportion of 9x9 players than 19x19 players” (with respect to the corresponding subsets at OGS), however, I think that this objective statement itself can be a bit misleading, since it is ambiguous as to the root cause, which may be:

  1. The player in question is “stronger” at 9x9 than 19x19 (relative to the global population of go players for each board size).
  2. The population of 19x19 players at OGS is “stronger” than the population of 9x9 players at OGS.

It is quite possible that it is due to a mixture of both causes, and it’s impossible to rule out one over the other without more information about the populations of players. However, I believe that many people, when viewing the percentile comparison, might erroneously interpret it as only the first possibility listed above.

2 Likes

Great discussion all around, thank you to everyone! :heart:

I would like to strongly suggest that those who are coming back to this discussion please consider contributing to/voting on the Kialo discussion linked here. It’s a much better way to keep track of a complicated issue like this one!

Much of what we’re talking about could be a part of the “Show percentiles on the Rating Table section”. Please at least take a look at it!
Sorry if you’re already in there, it’s difficult to keep track.
(@BHydden, @opuss, @_GoBoard, @flovo, @yebellz)

I’m a little frustrated by the percentile system idea, because I feel like there are a lot of assumptions going into it that just aren’t clear to me.

  • It seems to assume some kind of inherent ladder for each of the rating pools.
  • It also seems to assume that either your rating is compared to either the entire population of OGS, or only those players you’ve played against. It also seems to assume the user will know which one, so they can interpret their percentile correctly.
  • In any case I get the feeling that for this to work, these percentiles would have to be calculated in a way that is different enough from how the Glicko scores are calculated that it would pose a non-trivial development effort.

@yebellz hit on why this can be so muddy much better than I could. In short, I feel the percentile system could be really good, but there are so many fiddly issues with it that I’m not sure it would make a good change.

I apologize if that didn’t make too much sense. I’ve been trying to respond to the percentile idea for a while, but it’s just too confusing to me to make an eloquent response. :stuck_out_tongue_closed_eyes:

2 Likes

Wall of text to hopefully clarify MystWalkers questions:

Yes. The percentiles give you something like a ladder. They show you, how many players in this pool have a lower rating/rank than you have. Don’t how good you are, only how many players are weaker (in this pool).

Your rating is compared to all player in the corresponding pool.

They won’t know any better than they know now what the meaning of the numbers in the breakdown tables are.

There are 2 ways to do that.

  1. The proper would be to recalculate the percentiles each time a rating is calculated. This would require that the server look up all roundabout 250000 users sort them by rating, and then sum weak → strong.
    This is more an issue of load on the server than a programming one (but it has to be done).
    To lower the server load, one can update the percentiles only once a day/weak …
  2. You can do it in an approximate way. Like we do the mapping rating → rank, we can find a formula witch maps the rating to an assumed percentile.
    One would do this by calculating the percentiles now (for each breakdown separately). Then one would look for a function, which reflects the current rating → percentiles. This function will be used to calculate the percentiles from now on.
    Please Note again. This would be only an approximation, but users looking at their breakdown chart could hardly tell the difference.

S_Alexander made a histogram showing both rank and percentiles for the OGS user base (only overall rank I guess). You can use it to get an estimation, how the percentiles would look like for different players.

4 Likes

As far as I can see, there are 2 ways one could (and do) use the breakdown table:

  1. To compare the playing strength between different players for the same pool. (comparing me to others / “I’m better at 9x9 than Michael”)
  2. To compare one player individual strength between different board sizes / speeds (comparing me to me / “I’m better in blitz than in correspondence games”, “Is my performance on blitz as expected for a player of my rank”)

At the moment, the breakdown table can be used for the 1st but not for the 2nd use case.
And I don’t see how any of the proposed changes would allow use to do the 2nd, since they all operate independent on each pool. Therefor there is no relation to the other pools and therefor no comparability.

The proposed changes only map rating -> (rank, percentiles, elephants, … (I don’t care much how you call the numbers if can use them for 1 and 2 :wink: ))

Maybe it is possible to get all the pools on the same scale, making them comparable to each other.
(Maybe I will take a look if this is possible later, but don’t wait for it. It will take much time and I’m not sure if I find the time for it.)

@MystWalker I don’t know, where the right place for this is on Kialo, therefor I post it here.

EDIT: clarified due @GreenAsJade pointed out, that my choice of words can be misleading. Hope you get my point now.
EDIT 2: trying again :blush:

1 Like

Have you overlooked or disputed the comparison of percentile-in-the-pool.

AFAIK it is entirely meaningful to say “I am top 10% in blitz 9x9 but median in overall” and to use this as a comparison that says “I perform better in the pool of blitz 9x9 than overall”.

And

  1. To track your individual progress in each pool.
1 Like

I have issues with this line of reasoning, not because I think it’s wrong, but I think it’s misleading to assume that the percentile system is better than the other systems at the comparisons for number 2 (compare [a player’s] strength between different board sizes / speeds).
If all the smaller pool ratings were in the same units it’s easy to make comparisons. 8 cows in 9x9 is better than 4 cows in 19x19. I don’t understand why comparing percentiles is better. Granted, you would know how many people you are better than on the server, but that doesn’t address point number 2.

Is there something I’m missing?

I think so.

I think you are missing the fact that saying that I’m “the best” at 9x9 blitz is equal in some sense to saying I’m “the best” at 19x19 live, and equal in that same sense to saying I’m “the best” Overall.

In each case, I am the best at something, in a given pool. Sure its understood that its easier to be best in a small pool (or arguably it might be) but nonetheless “best” is a transferable concept.

Similarly “I am median” is comparable. “I am median overall, but I’m the best 9x9 blitz player” says something meaningful and understandable about your playing ability relative to others.

That is why percentiles is a candidate solution, targetting those who are trying to derive some comparative meaning from the table.

Okay, this is where I’m getting frustrated.
We’re conflating comparing the ranks of an individual’s differing pools, and comparing one person’s rank in one pool with the rank of another person in the same pool.
I’m trying to make it clear that these are two separate use cases, but they keep getting mixed up.

Case A:
Segue is a user on OGS.
As a user on OGS Segue wants to see if they are better when playing Live 9x9 or Live 19x19.

Case B:
Jackie and Brovien are users on OGS.
As a user on OGS Jackie wants to see if they are better than Brovien in the category of correspondence 19x19.

I don’t see how percentiles are better in either of these cases. Maybe someone could make a different example case?

I think these are valid observations that one could make. However, the question of “why do I perform better in the 9x9 blitz pool?” is still ambiguous.

Is it because “I am stronger at 9x9 blitz than other game types” or is it “the 9x9 blitz pool is mostly made up of weaker players” (which could be a possibility if a lot of novices are looking for quick games, but stronger players disdain that game type).

3 Likes

What I want to now is:
“Is my performance in 13x13, what one would expect for a player of my overall rank or am I worse on 13x13”

Since the average strength of players in each pool is different, I would expect something like “9x9 65%, 13x13 55%, 19x19 45%” even if I’m performing equally good on each board size.

To get my point clear, this is how many players of each overall rank are in each pool (thanks S_Alexander ):

1 Like

What if most of the beginners are only playing blitz 9x9, while being too intimidated to even try live 19x19 games?

If you found someone that was a median player overall, but happened to be one of the best blitz 9x9 players on the site, I would think that the above hypothesis would be more likely to have played a factor in this observation.

1 Like

I agree that this is a potential complexity in interpreting comparison of percentile across pools.

I was just trying to show why comparing percentile (which is a transferable measurement) is more legitimate than comparing rating from pool to pool.

I’m not seeing this. WHY is it more legitimate?

You could say the same in Glicko or Kyu/Dan. “I am a 12kyu overall, but I am a 9Dan 9x9 blitz player.” The only thing lost in translation is the position in the pecking-order of the whole OGS community. This is more information, it is not necessarily better or more legitimate information. Frankly, I’m not crazy about having a top-down ladder for the whole site at all.

If we want to add this functionality to the table, fine, but it is not information exclusive to the percentile suggestion. We could have the median (or average, or mean, or mode, you statistics nerd :wink:) rank displayed for each group. I’m not saying that’s a great idea, but it’s something we could also do.

Because “best” and “median” are terms that have a universal meaning.

Wheras “1500 points” does not.

It’s granted that just because I am “best” in my club doesn’t mean I’m “best” in the world: best is relative to the pool it talks about, but at least the meaning is clear.

If I tell you “I am best in 9x9 blitz but median in overall” we can have a conversation about what that means. It might mean that I studied hard at 9x9 and think fast, or it might mean that all the beginners are in 9x9 blitz. But for sure it means that I’m the best at 9x9 Blitz, but not the best overall. At least we understand the meaning and the comparison that is being made.

If I tell you “I am 1500+/-10 in 9x9 blitz and 1500+/-10 overall” the only thing it means is that I am pretty certainly 13k Overall. It tells you nothing at all about 9x9 blitz.

GaJ

Kyu/dan ranks tell us absolutely nothing. They can tell us something if we have particular opponent to compare with. The only reason we’re comfortable with rank is our experience with the system. We sorta know how hard each rank is. But when it comes to categories it’s harder because we hardly ever see categorical ranks except for that table. And when we do see it in the table most of us have mo idea what numbers could mean. In case of percentile at least we get info about our position in this specific pool. We still can’t really compare ourselves between the pools but it’s better than rank.

And we need to remember that percentiles usually come in with a whole graph. And graphs are very cool and pretty. I like looking at graphs.

3 Likes

Check lichess out.


https://lichess.org/stat/rating/distribution/classical

3 Likes