Show Kyu/Dan instead of Glicko Rating on Player Profile

Indeed, if a simple 2 by 2 comparison wasn’t possible than it would be a “Catastrophic problem”, not a “Serious Problem”. The issue with “Serious Problems” is that you can finish what you want, but you must work around the problem. The user is led to expect a direct path to the result, but there is none available.

Again, let me use another example: Imagine if to login in OGS you had to first enter on the about page. You never expect to login in any website to be on the about page. It does work perfectly, if you input the correct information you are successfully logged in. But do you really think that is a expected behavior from a user standpoint?

The problem when using Glicko to show information in the user profile instead of kyu/dan is that isn’t expected from a Go player to understand Glicko (again, not only just read it, but fully make use of it), while it’s extremely expected from a user to understand (same definition as above) kyu/dan. And because of that, the user can’t utilize the information in a way that is expected from him. Again, how can this not be a “Serious Problem” ?.

It never was about the complexity of Glicko, but rather how it doesn’t allow such a intuitive use as kyu/dan does. If OGS at least did provide the user with enough information about the Glicko system so that he can utilize the same way he currently utilizes kyu/dan (such as a info box for instance) than there would be no reason for this topic even exist in first place.

I will use another example to illustrate my point, in hopes that this time it can be fully comprehended. The metric/imperial system. Can you utilize both of them, even if you are just versed in one? Sure. Can you easily understand that 20 feet is bigger than 10 feet, or that 100 meters is bigger than 50 meters? Sure. Can you translate measurements between the two? With some help, but sure.

Now, here is the catch. If you born and grew up only utilizing the metric system (my case for instance), can you have the correct dimension of what 80 feet represents? No, I don’t have a clue. It’s it bigger than a football stadium or smaller than car? I don’t know. Could I search for the information and then figure it out? Sure. But then, if right now isn’t anymore 80 feet, but 135 feet instead. Do I truly have the dimension of what 135 feet means? No. I could guess based on the 80 feet information, but even then would not be very accurate. Now, If you come to me and says “135 feet is equal to 41,148 meters.” then I would, without a doubt, have an extremely good dimension of what you are talking about.

3 Likes

You know what, I think I have come to agree with this statement:

The table of rating numbers for different game types should be displayed as rank”.

Can anyone succinctly summarise why this should not be the case?

I vaguely recall in the past an argument why it isn’t that way, but I can’t remember what that argument is.

4 Likes

From what I can recall quickly, someone said that the devs thinking it’s misleading, since it’s using different scales. What, for me, doesn’t make any sense, since they are already showing the same information, but in a not so friendly manner. Someone versed in Glicko should be able to reach the same conclusions that they are trying to avoid the users from doing it in the first place.

1 Like

The official statements about that.

5 Likes

Well, I’m glad that finally someone has cleared that up for me. However, I would like to use the same reasons I listed within this topic to again discuss changes in the player rating table.

Just to recapitulate: (all points have been discussed more in depth along this topic):

  1. The current player rating table, as it is, doesn’t allow the user to use it as expected, since the user can’t correctly interpret the information that it supposed to have (especially considering that this information can’t be correct mapped to kyu/dan)

  2. It may misled the user into thinking that a specific ranking is used for a specific type of matches (examples have been shown along this topic)

  3. Its only current functional use is to act as a filter for the graph (which again is only showing in Glicko which is unreadable for the end user, but let’s leave it aside for now)

That said, my suggestion for now would be remove the players rating table and integrate it with the graph. That way we don’t have removal of a working functionality and the only information being removed is one that already only serves to confuse users.

5 Likes

:+1:

My reasons for agreeing hav already been covered in depth here: Rating anomaly? Is this a bug?

3 Likes

Thank you very much, @flovo, for providing this information. As the first quote from @flovo indicates:

there is nothing sacred about keeping the breakouts! :latin_cross:

What I find strange is the latter part of that quote: “… at this point they are strictly informational.” Informational? Quite the contrary! They are confusing and pretty much useless given that “they are on different scales” which no one understands or can convert into an understandable form – as is made clear in the second quote:

Isn’t it time to remove these breakouts? I regularly read the forum postings and there are regularly questions, often not directly but in essence, concerning the use and interpretation of these breakouts. And no one can provide a clear and succint reply/explanation. :thinking:

The experimental phase has run long enough, and the results are quite clear:

Conclusion: the breakouts are worthless. :x:

4 Likes

Let me try.

Each breakout measures your performance against the other people and games in this pool.

You can use the raw number you see there to:

  • Compare your performance in this pool to that of others, for the same pool.
    ** Bigger numbers mean you performed better
  • Track your performance over time in this pool

It’s really not that hard to understand.

It only gets hard if you start making it hard by asking “So, am I a 10k 9x9 correspondence player?”.

The answer to this question is

“That question can’t be answered by this table, nor by any mechanism we have. We only rank overall performance.”

GaJ

4 Likes

Not quite though I agree with the sentiment.

This would appear to be completely true as best I can tell but I don’t feel that this limited use for the data justifies it’s pride of place on the Profile page.

Also true for anyone that has read a few forum posts but it is not easy to understand from the OGS site itself.

Upon seeing such a table at the top of my Profile page; I would instinctively like to be able to use it to judge which board sizes I am performing better on and by how much. Also; which time settings I’m okay with and where I might be weaker. None of this is possible with the data provided whereas the possibilities for being mislead by the data are enormous as various posts attest.

I would love it if the numbers on the Ratings Table could be mathematically normalised to be comparable but I judge this to be extremely unlikely in the near future so I return to my earlier assertion:

7 Likes

I started using the 19x19 live ratings when I participated in smurph’s DDK experiment.

They are useful if you want to find an opponent of roughly equal ability.

First convert the 19x19 rating into a rank. Use this rank to create a custom game. Unfortunately you can only specify a rank and not a rating in the custom games dialog.

Then when someone accepts a game, check their profile to see if their 19x19 rating is within a suitable range. If not, just cancel the game before your first move and apologise in chat. This avoids playing people who are good at 9x9 but haven’t played much 19x19. The deviation can also be useful in such cases.

This method works but is slightly cumbersome. It also eats into your time if you are white, which is a minor disadvantage in live games.

1 Like

Everything you said could, and would, be more easily interpreted by the user if the table was integrated inside the graph. Right now I can’t see a use which justifies it be separated from with it.

Well, to me what you just said seem as a motive to remove the table, not the opposite. It’s impossible to anyone, at least I hope so, to dictates what someone can or cannot think. If an interface is allowing the user in a way that he can misled, then this interface has problems who must be solved.

Totally agree. As I said before, the way it’s right now induces to user into error, as been proven many times. The way GreenAsJade seems a use for it is already contemplated by the graph, keeping the filter functionality would not only keep as useful funcionalities in place but also remove the ones that only adds confusion to the user .

Well, sorry to say that, but had you just did is mathematically wrong, as been proven by a anoek quote. It’s impossible to correctly map ratings in to rankings, which is, again, another error that the current rating table leads the user into. Also needless to say, but the usage described along your post isn’t the expected one from a OGS user. I’m all in for different ways of using the tools we have in ours disposal, but since this one is grounded on a false idea of skill, then I can’t justify its continued use.

3 Likes

If GreeAsJade is correct, and the live 19x19 rating just uses a different pool of games, then playing an opponent with a similar live 19x19 rating as mine should lead to an even live 19x19 game.

It should also provide a better indication of progress in live 19x19 games than my overall rating. This is why smurph chose to use it as a measure in his experiment.

The problem is that I can’t easily get a game with a player with similar live 19x19 rating to mine. I have to create a custom game with an overall rank restriction.

3 Likes

This (alone) is not enough. This only gets you opponents who’s overall ranking matches your live ranking.

So then you have to look in their profile when they accept, and cancel the game if they don’t match your live ranking.

Unless flovo was mistaken somehow, and actually our sub-ranking is used for game matching.

It really would be good if @anoek or @matburt could clarify this point.

GaJ

I don’t think this is true. From what I understood from anoek quote, it seems that isn’t a matter of number of played games, it’s a way more complex issue than that. Your 1500 19x19 live rating isn’t necessarily equal to another player 1500 19x19 live rating. The only thing that really matters, and it’s truly possible to use as a means of comparison is the overall rating.

I think you answered to @opuss instead of me, since he was the one who said that. Anyway, I agree with you, the sub ratings aren’t really used (and actually can’t be used) for matchmaking. Again, it’s really needed some changes in that area, this type of confusion shouldn’t even be happening in the first place.

2 Likes

Exactly! I also have to look at the 19x19 rating.

This is where we really need better documentation.

As I understand it, the 19x19 live rating can be used as a comparison between players. What you can’t do is make an accurate comparison between different ratings.

The overall rating can give a bad indication of a player’s 19x19 ability.

3 Likes

What gives you this idea? I don’t think that’s true. If if were true, then your overall 1500 rating would not be equal to another player’s overall 1500 rating.

The only difference between the sub-rating is the pool of games that they come from.

The reason we don’t convert this to rank is because we don’t have a calibration for that.

The calibration from rating to rank is designed to give us ranks that match our old ranks best, and also to match other sites as best as possible.

That calibration is done for the overall pool. It would be different for sub-types. Even if we had data to calibrate sub-types, which we don’t.

That is the reason why sub-type rating is not converted to rank: because it can’t be calibrated.

Personally I would argue “just use the overall rating calibration - it’d be close enough”.

What we know is that right now anoek and mattburt don’t agree with that.

Correct (as I understand it :wink: )

By the way I reached out to anoek, and he said:

“What is used in player matching for auto matches”

“[1:39] anoek: Overall rank”

GaJ.

4 Likes

No, I’m not talking about the overall rating, just about the sub-ratings. Also, that’s what I could understand from this quotes:

To me, they say that since the sub-ratings come from different scales and different ratings pools, they are only related to the overall rating and not on a global scale (19x19 live from one user to 19x19 live from another user), and since the data which each user obtain its ratings is different, each users sub-ratings true meaning is different.

Simplifying, when reading the information above I understood that since each user has a different history, the way that each sub rating meant is different (think relative on a player data instead of absolute all players 19x19 live data).

=============================================================================

Oh, by the way @GreenAsJade, I wrote that before you edited. I am just gonna post that anyway to inform my thought at time. Right now, I think I agree with you interpretation about that, just really would like some sort of final clarification about this from the devs.

2 Likes

Took a while to catch up on the discussion.

I probably haven’t been on this site as long as any of you, as such I probably won’t have as nuanced a perspective. That said, I have been here a while and this is the first time I’ve ever heard of the Glicko system. I know all about the kyu/dan system, since it has (I feel) an inseparable connection to the game and it’s handicap system.
If we want to keep the user-visible Glicko scores in their prominent position (and it sounds like we may not) there needs to some easy to access explanation for it. As I mentioned, I’ve never heard of the Glicko system, and based on only the numbers I would have had nothing to search for in Wikipedia or the Internet in general.

I’d like to know why it’s not possible to have useful user ranks for matchmaking in the smaller game pools. It seems like that could be a very useful set of data. It would be great to be able to match myself with someone of an equal strength on that board size easily. Maybe that could be a matchmaking option, and general score would be the default.
Speaking as someone who is stronger in 9x9 than larger boards, it would be really nice to be able to match automatically with someone my speed on that board size instead of getting beat down enough that my total rank falls to compensate.

In brief: I’d vote for the Glicko numbers to go, and have them replaced with kyu/dan numbers if possible.
My questions are as follows:

  1. Why are the smaller pool ranks not shown in kyu/dan numbers?
  2. Why can’t we use the smaller pool numbers for matchmaking?
  3. If we can’t use them for matchmaking, can we make them more viable for comparison between players somehow?
  4. If the smaller pool scores aren’t usable for matchmaking or comparison, why have them at all?

Are there technical/mathmatical roadblocks?


Edit: GreenAsJade’s last response was not out when I started writing.

So the main hurdle to having kyu/dan numbers is this calibration? What does this calibration do exactly? Why is not having it for the smaller pools a deal-breaker?

4 Likes

As I understand it (an interested person like you all, not an expert or someone with the code)…

… the calibration is simply a formula to get from the rating to the rank.

You can think of it as X here: Rank = X divided by Rating

So simplistically we know that 13k = X divided by 1500

so X is about 1500/13 = 115.

It’s not that simple, but that is the idea.

The calibration was done for overall by first:

  • Applying Glicko rating to all of our games from the beginning of time

  • Finding X (the calibration) so that the most people end up with a new rank that is closest to their old one from the old system.

The trick is that you can’t properly compare numbers from a different pool of games. What this means is that:

  • you can’t say 1500 in live 9x9 is the same as 1500 in blitz 19x19

  • therefore you can’t say 13k live 9x9 compares to 13k blitz 19x19

If we applied the calibration blindly, I might turn out to be 10k live 9x9 and 12 k overall, but actually be worse at live 9x9.

Once again, my understanding (not authoritative) is that the Devs didn’t want this kind of comparison accidentally being made. They don’t want someone saying “I’m 1 dan live 9x9” because they know that this doesn’t mean anything other than you are a better live 9x9 player than an OGS 1k live 9x9 person.

I believe that they fear the risk that OGS would have Dan 13x13 blitz players, drawing ridicule because that is not a valid concept, and could result in people having airs and graces that they don’t “deserve”.

I believe they made the number table because the data is available and it is interesting for each of us to be able to know how our performance in the sub-types goes over time.

This statement back’s up @Ptro 's assertion that there shouldn’t be a table, there should just be separate graphs available for each sub-type.

I can’t think of any reason why the sub-type ratings should not be used for game matching. It seems like doing so would improve skill match in games…

GaJ

6 Likes