Show Kyu/Dan instead of Glicko Rating on Player Profile

I’m confused by this statement. While the percentiles may be easy to understand, you can’t make any inferences between pools. A 90% rating in one area has no bearing on your performance in another area. The example you give here would work the same way as the Glicko and Kyu/Dan methods. If I am a 20kyu 19x19 player and a 9kyu 9x9 player, I know that I am a better 9x9 player.

I must be confused, can you explain why this is a better option?

Is there a way to make a simple vote on the forum? That might be a good idea once we have a solid idea of our options.

Yes. In a reply, click on the “options” (cog) icon and select the “Build Poll” option.

2 Likes

TL;DR

Comparing ranks and ratings across pools gives no useful information. You don’t know whether you are comparing the rating pool or your own playing strength. Comparing percentiles allows you to compare your position within the pool. Whilst that doesn’t entirely match your playing strength, most pools are sufficiently large that you can infer your playing strength from your position within the pool.

The most extreme example is to compare performance across different games:

  • On most Chess websites I am stronger than 95% of players.
  • On most Go websites I am stronger than 50% of players.
  • Therefore, I am stronger than a higher proportion of Chess players than Go players.

Comparing ranks and ratings

Sure thing! But first, let’s just clear this up:

If I am a 20kyu 19x19 player and a 9kyu 9x9 player, I know that I am a better 9x9 player.

Actually, that’s not quite correct. How do you know you’re not equally strong at 19x19 and 9x9? The difference in rank might be because of the difference in rating pools rather than a difference in playing strength. We need more information before we can infer how much playing strength was a factor.

Let’s restate this by introducing different Go servers to demonstrate where it goes wrong:

If I am a 15k on KGS and a 10k on OGS, then I know that I am a better OGS player.

That’s not the conclusion most people would reach. Instead, they’d probably suggest that 15k on KGS is about equal to 10k on OGS. You’re the same player! It’s the rating pools that are different.

But of course with different board sizes we introduce an extra factor:

If I am a 15k at 19x19 on KGS and a 10k at 9x9 on OGS, then I know that I am a better 9x9 on OGS player.

Comparing KGS ranks to OGS ranks was difficult enough already. Really, this should be:

If I am a 15k at 19x19 on KGS and a 10k at 9x9 on OGS, then a 19x19 15k on KGS equates to a 9x9 10k on OGS.

We have learned nothing new from our comparison. Now we can make sense of the original quote:

If I am a 20kyu 19x19 player and a 9kyu 9x9 player, then 20kyu at 19x19 equates to 9kyu at 9x9

Therefore there is no benefit to showing multiple ranks or ratings across different board sizes or game speeds.

Percentiles are different

When I join a new Go server I cannot predict in advance what my rank will be. But I can predict what my percentile will be:

  • I am stronger than 50% of OGS players.
  • KGS players are as diverse as OGS players.
  • I am therefore stronger than ~50% of KGS players.

This is not perfect, as it relies on KGS players being as diverse as OGS players. They probably are, I’m not sure, but the difference is not going to be significant when enough players are in the pool.

The result is a slightly different conclusion to just being stronger:

  • I am stronger than 50% of 19x19 players.
  • I am stronger than 80% of 9x9 players.
  • Therefore, I am stronger than a larger proportion of 9x9 players than 19x19 players.

This is not the most amazing conclusion in the world. But it is at least a conclusion - which is more than we got comparing ratings and ranks!

Conclusion

Comparing ranks or ratings gives us no information across pools:

If I am a 20kyu 19x19 player and a 9kyu 9x9 player, then 20kyu players at 19x19 equate to 9 kyu players at 9x9.

Comparing percentiles gives us a lot of information across pools:

If I am stronger than 50% of 19x19 players and stronger than 75% of 9x9 players, then I am stronger than a larger proportion of 9x9 players than 19x19 players.

8 Likes

I skipped a lot of the walls of text in the last third of this thread. But my 2c are that the best options are either removing the table because the information doesn’t outweigh the confusion, or as @Farraway is suggesting, switch them to percentile display, so that the table provides more information and less confusion.

4 Likes

After some consideration, I believe that the developers probably made a good decision to display ratings instead of ranks in the table. The average ability of players on OGS is different for different board sizes as shown in the rank histogram posted by @S_Alexander. Using the same formula used for the overall rank would be incorrect.

However, the ratings can be useful. They can be used to check that a game is even on a given board size.

If the volatility were also displayed, it would be possible to calculate the expectation value for the result of a match. The volatility is currently available via the API. In the future, perhaps we could specify that we would like a game within a given range of expectation values?

4 Likes

I feel the same way as Ptro about the rating system. I think that the traditional dan/kyu system is easier and more important than the Glicko rating. I think that the Glicko was a good idea but I feel that it is overall unnecessary because of how precise the system tries to be and how difficult it is to predict the precise rating even in the kyu/dan system. Of course I would also like to point out that a rating is just a number and is never perfect!! :grinning:

Thanks for bringing this topic up and I would like to thank OGS for having such a good system anyway:)

2 Likes

It’s not either Glicko OR Dan/Kyu.

We’ve always had Dan/Kyu ranking in the same way we do now - based on an underlying rating system.

Before it was ELO. And we had and ELO number just like our Glicko number, and we had a graph showing it.

The only thing that wasn’t shown to us before was our ELO number for all the different game types!

The only difference now is that we’re being shown some numbers we weren’t shown before, and finding them confusing.

GaJ

4 Likes

You need only the rating (and deviation) to calculate the probability to win the game.

I wanted to link S_Alexanders post as well. :+1:

2 Likes

Quite right. Only the two ratings and the opponents deviation are needed.

I don’t see what information I would get out of that.
The reasons for that could be:

  • I’m stronger on 9x9 than 19x19
  • The 9x9 pool contains a bigger proportion of weaker players than the 19x19 pool (this seems to be true)
  • Maybe both apply, leading to no relevant information at all.

Side note:

  1. is only true if 2. is true.

The difference could be significant, regardless of how many players are in the pools, if there is a bias driving strong players to one pool and weaker to the other. (Pull factors could be accessibility and how easy it is to find even games)

2 Likes

Thank you @GreenAsJade for the explanation. I apologize to you and anyone else who reads this for myself not understanding the question:grinning: If I understand @GreenAsJade properly now there are just more number sections; (19 x 19, 13 x 13, 9 x 9) that have been split into ELO rating sections. I do understand the fact that the kyu/dan ranks are under the ELO numbers (it is pretty obvious), but I think that they just need to be hidden and continued to be used for the kyu/dan system which should be visible to other users. I feel that the sections they have under your profile should just be changed into kyu/dan sections. Again, I am sincerely sorry for confusing this more and anyone is more than welcome to correct me again if I still don’t understand this properly.

Thanks from The “GoBoard” (@_GoBoard)

Well it allows you to actually compare something! That’s a start in the right direction. This is important because comparison is the key point here. Comparison to what? Well, there are two things we’d like to compare:

  • Comparing your rating to other’s, i.e. your performance relative to others within a rating pool.
  • Comparing your rating to your other ratings, i.e. your relative strengths and weaknesses.

A rank doesn’t actually mean anything. If I told you that I was 4 elephants at 9x9 and 10 mongoose at 19x9 our minds immediately jump to the ridiculous comparison of elephants and mongoose. We actually try to start figuring out how many mongoose go into one elephant anyway?

My point is that people are always going to tend towards comparing their rating to their own ratings. That in mind, the problem we want to solve is how to compare elephants to mongoose - well, we can do that to a certain extent using percentiles:

A nice property of percentiles is they have a universal interpretation: Being at the 95th percentile means the same thing no matter if you are looking at exam scores or weights of packages sent through the postal service; the 95th percentile always means 95% of the other values lie below yours, and 5% lie above it. This also allows you to fairly compare two data sets that have different means and standard deviations (like ACT scores in reading versus math). It evens the playing field and gives you a way to compare apples to oranges, so to speak.

From statistics for Dummies, Second Edition

This reply was better than 68% of replies.

I think that comparing percentiles across different bins could also be misleading.

For the comparison to be meaningful, we have to assume that the population within each bin is similarly distributed in strength. However, I think it’s quite possible that we might have skewed strengths across various bins. Perhaps, there are more beginners focusing on 9x9, while more stronger players prefer 19x19. Maybe the population of live players is largely independent of and at a different level of skill than the correspondence players. Of course, these are just speculative examples for the sake of making a point, and they may well be false.

So, the perception of being “stronger than a larger proportion of 9x9 players than 19x19 players” (based on percentiles), might just be reflective of the skewed strength across the groups of players in different bins, rather than the individual player in question.

Edit: added below text for further clarification

I agree that percentiles can objectively determine that a player is “stronger than a larger proportion of 9x9 players than 19x19 players” (with respect to the corresponding subsets at OGS), however, I think that this objective statement itself can be a bit misleading, since it is ambiguous as to the root cause, which may be:

  1. The player in question is “stronger” at 9x9 than 19x19 (relative to the global population of go players for each board size).
  2. The population of 19x19 players at OGS is “stronger” than the population of 9x9 players at OGS.

It is quite possible that it is due to a mixture of both causes, and it’s impossible to rule out one over the other without more information about the populations of players. However, I believe that many people, when viewing the percentile comparison, might erroneously interpret it as only the first possibility listed above.

2 Likes

Great discussion all around, thank you to everyone! :heart:

I would like to strongly suggest that those who are coming back to this discussion please consider contributing to/voting on the Kialo discussion linked here. It’s a much better way to keep track of a complicated issue like this one!

Much of what we’re talking about could be a part of the “Show percentiles on the Rating Table section”. Please at least take a look at it!
Sorry if you’re already in there, it’s difficult to keep track.
(@BHydden, @opuss, @_GoBoard, @flovo, @yebellz)

I’m a little frustrated by the percentile system idea, because I feel like there are a lot of assumptions going into it that just aren’t clear to me.

  • It seems to assume some kind of inherent ladder for each of the rating pools.
  • It also seems to assume that either your rating is compared to either the entire population of OGS, or only those players you’ve played against. It also seems to assume the user will know which one, so they can interpret their percentile correctly.
  • In any case I get the feeling that for this to work, these percentiles would have to be calculated in a way that is different enough from how the Glicko scores are calculated that it would pose a non-trivial development effort.

@yebellz hit on why this can be so muddy much better than I could. In short, I feel the percentile system could be really good, but there are so many fiddly issues with it that I’m not sure it would make a good change.

I apologize if that didn’t make too much sense. I’ve been trying to respond to the percentile idea for a while, but it’s just too confusing to me to make an eloquent response. :stuck_out_tongue_closed_eyes:

2 Likes

Wall of text to hopefully clarify MystWalkers questions:

Yes. The percentiles give you something like a ladder. They show you, how many players in this pool have a lower rating/rank than you have. Don’t how good you are, only how many players are weaker (in this pool).

Your rating is compared to all player in the corresponding pool.

They won’t know any better than they know now what the meaning of the numbers in the breakdown tables are.

There are 2 ways to do that.

  1. The proper would be to recalculate the percentiles each time a rating is calculated. This would require that the server look up all roundabout 250000 users sort them by rating, and then sum weak → strong.
    This is more an issue of load on the server than a programming one (but it has to be done).
    To lower the server load, one can update the percentiles only once a day/weak …
  2. You can do it in an approximate way. Like we do the mapping rating → rank, we can find a formula witch maps the rating to an assumed percentile.
    One would do this by calculating the percentiles now (for each breakdown separately). Then one would look for a function, which reflects the current rating → percentiles. This function will be used to calculate the percentiles from now on.
    Please Note again. This would be only an approximation, but users looking at their breakdown chart could hardly tell the difference.

S_Alexander made a histogram showing both rank and percentiles for the OGS user base (only overall rank I guess). You can use it to get an estimation, how the percentiles would look like for different players.

4 Likes

As far as I can see, there are 2 ways one could (and do) use the breakdown table:

  1. To compare the playing strength between different players for the same pool. (comparing me to others / “I’m better at 9x9 than Michael”)
  2. To compare one player individual strength between different board sizes / speeds (comparing me to me / “I’m better in blitz than in correspondence games”, “Is my performance on blitz as expected for a player of my rank”)

At the moment, the breakdown table can be used for the 1st but not for the 2nd use case.
And I don’t see how any of the proposed changes would allow use to do the 2nd, since they all operate independent on each pool. Therefor there is no relation to the other pools and therefor no comparability.

The proposed changes only map rating -> (rank, percentiles, elephants, … (I don’t care much how you call the numbers if can use them for 1 and 2 :wink: ))

Maybe it is possible to get all the pools on the same scale, making them comparable to each other.
(Maybe I will take a look if this is possible later, but don’t wait for it. It will take much time and I’m not sure if I find the time for it.)

@MystWalker I don’t know, where the right place for this is on Kialo, therefor I post it here.

EDIT: clarified due @GreenAsJade pointed out, that my choice of words can be misleading. Hope you get my point now.
EDIT 2: trying again :blush:

1 Like

Have you overlooked or disputed the comparison of percentile-in-the-pool.

AFAIK it is entirely meaningful to say “I am top 10% in blitz 9x9 but median in overall” and to use this as a comparison that says “I perform better in the pool of blitz 9x9 than overall”.

And

  1. To track your individual progress in each pool.
1 Like