Show Kyu/Dan instead of Glicko Rating on Player Profile

Ptro · August 19, 2018, 7:34am

Sure, this is debatable. I argued more than once that is more just a cosmetic issue. Say what you disagree with and from there we can discuss, rather than just disregarding everything I said about this topic by saying: “feels like it’s about a bunch of nothing”.

Nomenclature was never the point. OGS previously used a system called ELO+ (if my memory isn’t failing me), and then switched to Glicko-2. It’s natural that are a few users that aren’t fully accustomed yet (despite how many time has passed since then) and can confuse the way the current one works with how the previous one worked. It was from that premise that I pointed that out, which could explain yours “misconceptions about how the system truly worked”

About the lower being worse: Indeed, you are right. I was the one who made a mistake. Here is the post I was referring about:

Since GreenAsAJade is currently my main channel between me and the “dev truth about Glicko” I’m constantly changing my perspective about Glicko as he updates this thread with his findings, even when things aren’t much intuitive (as many things turned out not to be). In that post, I didn’t had considered that he was still talking about how can’t we compare different pools of ratings (which still isn’t 100% clear topic for me) instead of just the numerical value being compared. I apologize in that case.

Sorry, but you are still completely missing my point. Handicap stones don’t have literally nothing to do with the UX concept being exemplified (which, again, as I mentioned it’s just a example for didactic purposes). Kyu/Glicko, English/some weird symbols, it doesn’t matter at all. The user expects information being show in a specific way. This isn’t happening and the current way doesn’t add anything to him that could justify it being needed staying how it is. However, it impossible to show that in a way that can effectively be used him. Therefore, the removal of the unusable information it’s desired, in order to avoid unneeded confusion during the interaction between user and system.

You seem to be very interested in putting out your opinion about this thread, which I really do appreciate a lot (even if it may not seem), but simply disregarding and devaluing someone else views and opinion simply on a subjective basis isn’t how a constructive discussion and criticism should occur. If you prefer not to comment about how I simply do not accept you calling all my recurring use of obtained through research (indeed, I’m not a expert either) UX knowledge as “lazy arguments” that’s fine. But please, keep in mind that I’m not making this up and if it keep recurring then I don’t see a reason to continue to engage in a discussion with someone who doesn’t actually listen to what other has to say.

opuss · August 19, 2018, 4:15pm

Yes. I calculate the rank to be 15.34 kyu with the range between 13.50 and 17.28 kyu.

So this would equate to 15.3 kyu +1.9 or -1.8 kyu.

Maharani · August 19, 2018, 5:01pm

Oh yeah, I got + and - mixed up, since + means worse in the case of kyu. Duh…

Ptro · August 19, 2018, 6:02pm

I know this is just a side note, but I would like to point out something. You maybe didn’t notice, but this is actually pro-removal, not the opposite. Trying to apply Glicko knowledge to translate ratings into ranking (as I even had predicted before that would happen sometime) actually reinforces the need for the table removal. Many reasons for this (all of them already discussed in this topic):

It leads into a mathematically wrong result when applied in the sub-ratings (as proven by anoek quote)
It leads the user into comparing between his sub-ratings, which is also mathematically wrong (as also proved by anoek quote)
You can’t expected that every OGS user to have this knowledge, which basically means that the only useful function that the ratings table provides for him is to act as a filter for the graph (which it’s actually not being removed)

Again, the removal of the table is a all-win situation. Everyone will continue to have the same amount of information (since the sub-graphs even already provides the same information displayed in the table), and the confusion created by many wrong uses of the table is gone.

Maharani · August 19, 2018, 6:36pm

Now hold up a second. All it means is that the +/- deviation is a little off when displayed as kyu/dan, which means that the Glicko2 display is slightly more accurate. If you really think this in itself is a good enough reason to get rid of the ratings table, then even the precise overall rank (for example 15.3k +/- 1.8 in your first picture) should not be displayed in kyu/dan format (or simply as 16k instead, to signal imprecision).

I have no idea what you mean by “mathematically wrong”. Let’s take a look at the anoek quotes again:

To sum up: The individual category ratings are based on different rating pools (i. e. “all live 19 x 19 games”, etc.). I assume that when anoek says “the results are humorous”, he refers to funny results such as the overall rating being higher than any of the individual category ratings, which is the very thing that is confusing to you and me and probably most people here.

However, that doesn’t mean that the individual category ratings (as displayed in Glicko2) are completely useless or “mathematically wrong”. If you look at anoek’s quote again, he says that the ratings table is “strictly informational”, so even according to your “trustworthy and academic source”, there IS information that can be obtained from it. For example, I use that information to determine which board sizes I want to improve at, whether I’m better at live or correspondence, etc. I’d hate to see it go.

Currently, the sub graphs still provide your Glicko2 rating when you hover your mouse over them at the right-most point of the graph. Do you feel like this functionality should be removed, since it enables the same “wrong” comparisons as the rating table does? Because if you do, then no, the sub graph would NOT provide the same information as currently displayed in the ratings table anymore. And if you DON’T think this function of the sub graphs should be removed, then there is no good reason to remove the ratings table, which summarizes that same information nicely.

To sum up my own thoughts on the overall issue:

The ratings table acts “weird” in some ways already demonstrated in this thread (overall rating better than all individual ratings, etc.).
Displaying it in a kyu/dan format would lead to some slight imprecision in the display of the rating deviation.
All the same, I feel like the ratings table is informational and interesting enough to keep it, and I don’t really care whether it’s displayed in Glicko2 or kyu/dan format. The former leaves no imprecision, the latter is probably easier to understand for most users and more consistent.
If you want to get rid of the ratings table so that no one can “wrongly” compare their sub-categories anymore, you’d also have to remove the “hover” function on the sub graphs that reveals the Glicko2 ratings on which these graphs are based.

orbitaleccentric · August 19, 2018, 7:04pm

Let’s all just ignore the god forsaken table and play Go like I do!

orbitaleccentric · August 19, 2018, 7:06pm

that is… ignore the table like I do

Ptro · August 19, 2018, 9:08pm

No, I wasn’t referencing your subsequent post, and not even the specifics of your calculation. Just the fact that this type of situation cannot be used as argument anti-removal.

I never said that the way Glicko calculates rating is mathematically wrong. Again, I said that according to that anoek quote (and one interpretation which is also shared by some users in this thread), you can’t cannot compare one sub-rating with another sub-rating, which leads into a mathematically wrong conclusion.

Again, you keep trying to devalue and downplaying my knowledge. Sorry, but I don’t consider this the correct way to conduct a productive discussion. I will continue to engage with you for the time being just because I consider that anyone deserves a second chance (which actually is a third one, but never mind). As I mentioned before, the use you are describing it’s leading you into a incorrect conclusion. According to my interpretation (which, again, is shared by some users in this thread), isn’t possible to consider this a correct conclusion.

I wrote that sentence exactly because I knew that you would jump on this point to construct your case, which finally would lead into a actual productive discussion between you and me. Let’s discuss something worthy discussing, at last.

Yes, it does enable the same wrong comparisons. It’s pretty obvious actually, since it displays the similiar information. No, you are completely wrong in your conclusion. Let’s point out why is that.

There are many documented cases of users using the table (and not the graph) to convert that information as ranking (which according to anoek it can’t be used as that means). Because of this, I’m focusing the discussion on the main responsible for the UX problems. (Note that I’m not saying that the graph can’t cause any UX problems, will get to that)
You also have to consider what is the main function of each one (and again this isn’t what I think, but what they are interpreted by the user as being. In UX we call that “affordances”)
- Table: comparison between different data that allows comparisons between the other elements on the same table (which currently isn’t possible)
- Line graph: comparison between the same data using a coordinate system (mostly used to display some data over time)
- So yes, they must be considered as different things, even when the data that they shown is similar.

In the same topic you extracted that quote I give you 3 good reasons why the table must removed. I can’t see how you can say that “there is no good reason to remove the ratings table” when I just gave you, not one, but 3 good reasons. And just completely ignored them when quoting my post. Which isn’t even the first time that this happens.Yeah, sorry, I actually do notice that
Returning to the confusion that the graph may lead the user into: Yes, the graph may cause UX problems (in fact one can say that it may already have caused that). But, here is thing: the implementation model of integrating the table with the sub-graphs isn’t set yet. When building it we can assure that it doesn’t allow any UX errors. And I trust me, if it has even a single one I’ll pretty vocal about correcting it, as I have been about all UX errors presented in this thread. I also would suggest (when we see that is more beneficial to all users in OGS to remove the table) do discuss this implementation model in a new thread, since this one has derailed a lot from its original purpose (which actually is clearly proven by the quote below).

Sarah_Lisa:

To sum up my own thoughts on the overall issue:

The ratings table acts “weird” in some ways already demonstrated in this thread (overall rating better than all individual ratings, etc.).

Displaying it in a kyu/dan format would lead to some slight imprecision in the display of the rating deviation.

All the same, I feel like the ratings table is informational and interesting enough to keep it, and I don’t really care whether it’s displayed in Glicko2 or kyu/dan format. The former leaves no imprecision, the latter is probably easier to understand for most users and more consistent.

If you want to get rid of the ratings table so that no one can “wrongly” compare their sub-categories anymore, you’d also have to remove the “hover” function on the sub graphs that reveals the Glicko2 ratings on which these graphs are based.

I still can’t see how are can be against removing the table when you acknowledge this. For me this gives me the feeling that sometimes you are just arguing with me just for the sake of arguing, since you are not actually taking all users UX into consideration and what as previously discussed about this topic when you are presenting your position.
See, this even reinforces this feeling. The discussion that you are citing here is already over. That’s exactly why I am considering (and now even more) changing the thread name from “Show Kyu/Dan instead of Glicko Rating on Player Profile” to “Remove the player rating table, but keep the sub-graphs functionality”, which did you decided to object against. Since the anoek quote about being impossible to correctly map Glicko sub-ratings into ranking, there is no point to show kyu/dan instead of Glicko Rating on player profile. Right now, this thread is discussing only about how the player rating table should be removed in order to avoid UX errors. Not that you shouldn’t talk about the kyu/dan aspect of this, but please consider that it’s a already discussed matter. When finishing the discussion about the main topic of the thread and continuing with another one that has surfaced when discussing the previous one the thread name should change to reflect that (changing the first post in the thread to indicate that is also a good measure), in order to avoid users from thinking is about a already decided topic (you, for instance)
Sorry, but if you can’t provide any further explanation on why do you feel like this than it’s impossible for us to discuss about. Please notice, here you are also discussing on a already discussed and concluded matter. Again, this isn’t the topic anymore here on this thread. You may talk about this, but please provide a different view that wasn’t previously shown in this thread. I really don’t understand on why you keep doing this. Right now, I can only 2 possibilities for this: One, by a lack of attention, didn’t noticed that the discussion about this topic was already over (which is fine, but please consider mine and everyone else time spend reading and writing about a topic which everyone has agreed and moved on). Two, you are trying to “win the discussion” by trying to make me “contradict” myself using a topic with I already said that my first impressions were wrong and then moved on. Hoping that the first one is the correct one, this is why I say that you haven’t put enough time reading and clearly understanding (which relates a lot to the my personal concept of understating means, with was also exposed along this thread) what is being discussed here.
Simply not true, and it was already shown within this post why this isn’t a valid argument. So reinstating my position, this conclusion is clearly wrong.

Ptro · August 19, 2018, 9:18pm

This gave me a good laugh actually, so thank you for that

The thing is, what you are asking is pretty similar to asking to a medical student to not try to help a relative who has a small medical problem. I just can’t ignore and pretend that everything is fine. I do know how to improve the UX in OGS, and I really want to do this as long as you all allow me to do it. I love using OGS and, as I tried before, want to contribute to allow it to become the best as it can be.

Maharani · August 19, 2018, 9:40pm

Point taken regarding the bolded part. Sorry for misconstruing what you said. However, as you acknowledge, the position that comparing sub-ratings with each other is “mathematically wrong” is just one interpretation of what anoek said. As I have explained, based on my experience, sub-ratings can absolutely be compared with each other, a little wonkiness notwithstanding. The more games you play in each category, the more reliable and comparable they become. Just HOW reliable any given sub-rating is can be clearly told from the associated deviation. Even a user who might initially be confused about wildly disparate sub-ratings can quickly recognize the reason for the disparity by looking at the rating deviations.

[quote="Ptro’]You also have to consider what is the main function of each one (and again this isn’t what I think, but what they are interpreted by the user as being. In UX we call that “affordances”)

Table: comparison between different data that allows comparisons between the other elements on the same table (which currently isn’t possible)
Line graph: comparison between the same data using a coordinate system (mostly used to display some data over time)
So yes, they must be considered as different things, even when the data that they shown is similar.[/quote]

Fair enough. I take your point that removing the ratings table doesn’t automatically mean that the sub-graphs should lose their function of displaying the Glicko2 rating when you hover over them.

I’m not saying that there are no good reasons to remove the ratings table. I’m saying (and explaining why) in my opinion and experience, having the ratings table is still better than not having it.

That is your opinion, and I disagree with it. Just because anoek thinks that “the results are humorous”, it doesn’t mean that the “discussion […] is already over”. As I’ve explained, I think consistency would be a pretty good reason to change the ratings table display from Glicko2 to kyu/dan. (However, as I’ve also stated, I don’t really have strong feelings on the matter.)

Again, with all due respect, you don’t get to decide that this matter is concluded. You’re already operating under the assumption that the ratings table should best be removed, which is why I’m explaining my viewpoint that it shouldn’t be removed.

As another side note, I would value your input into the aforementioned shade-of-colour UX issue discussed in this thread… Stretch rating graph

Eugene · August 19, 2018, 10:38pm

Nooooo!

Ptro · August 19, 2018, 11:07pm

Sure, this is debatable. I agree with your assertion that the more games you play the more accurate the sub-ratings can be, but I still believe that my interpretation can be correct. In that case, and since it’s interpretation x interpretation, I think can it only by solved with word from the devs to clear things up and solve this situation.

Not quite, everything you presented until this post was based on the assertion that we were considering that changing to glicko to kyu/dan (which in fact it was shown that isn’t viable). In this post you managed to be way more informative and explained much better your position to everyone.

This just makes me confused again. Yes, it was presented by me on this very thread that consistency is one of the main problems when showing Glicko instead of kyu/dan. If it’s currently is possible (and can display accurate information) then, without a doubt, it should be changed to kyu/dan. From what was discussed here, however, this doesn’t seems to be possible (or at least the devs don’t have the intention to do it for some unknown reason)

Sure, I don’t get to dictate when I discussion is or isn’t over. The thing is, and now it’s pretty clear to me, you are really misunderstanding a lot my true intentions when changing the thread name. No, I never operated under the assumption that the ratings table removal is already certain (even when it’s pretty difficult for me to understand why this shouldn’t happen). I merely explained my position that, from a UX standpoint, the table is doing more harm than good. Quoting myself from this very thread (which is I thing that I really don’t feel good doing, but let’s make this very clear once and for all):

(Of course, I was discussing another subject, but my position about this type of situation remains the same.)

When I said that you were discussing an already concluded matter I as referring about the change from Glicko to kyu/dan. According from what has been discussed until now about this, we mostly reached to a consensus that:

The displaying of Glicko rating instead of kyu/dan (which it’s what the users truly expects) isn’t desired. It must be corrected, if everything in the technical part allows that.
Apparently, the technical aspect of Glicko doesn’t allow it to be correctly displayed as kyu/dan in this specific situation. In that case, changes must be made in the table, in order to sort out existing UX problems with this current form of display.
All the currently displayed data in the table is already present on the sub-graphs, and the only possible use for a user who doesn’t understand Glicko is to use as a filter for the sub-graphs.
Considering all this, I formulated a suggestion that the table should be integrated with the sub-graphs. This would result in a 0 loss of information and existing functionalities and would solve some existing UX problems. We can also collectively discuss and solve the remaining UX problems when discussing and shaping the implementation model of this integration.

Now, returning to the thread name change proposal. I did that because, and you will probably even agree with me, is that we are discussing about the removal (or not) of the ratings table, not about the kyu/dan aspect, (with we managed to mostly reach a consensus). That’s why I suggest the thread name change, in order to invite users do discuss if the table should or not be removed.That’s my true intention when making this suggestion, not to consider that the table removal is certain.

Sure, I will hop on that as well. Can’t see a reason not to.

Ptro · August 19, 2018, 11:08pm

Brace yourself @GreenAsJade, more walls of text are coming

Maharani · August 20, 2018, 12:26am

Honestly, I doubt @anoek himself could offer much more than an interpretation regarding his statement that “the results look humorous”. We can already see in which way the results look “humorous” from your example picture in the first post. Plus, anoek rarely ever replies to forum posts anymore in general, and has refused to discuss any details of how Glicko2 has been implemented on OGS in particular, so I wouldn’t be very hopeful for an in-depth analysis of anything from him.

I would like to add the point that if the sub-ratings look “humorous” when converted to kyu/dan, then that calls into question the entirety of the OGS-Glicko2 rating system. The only difference (supposedly) between how the sub-ratings and overall ratings are calculated is the different “game pools” from which they draw. Given that, experientially, the overall ratings seem fairly reliable and OGS-Glicko2 working more or less as desired, I don’t see why the sub-ratings should be any less reliable (taking the rating deviations into account).

If it can be correctly displayed as kyu/dan for the overall rating, it can be correctly displayed as kyu/dan in the ratings table (see my point above) except for some slight imprecision regarding the deviation. Yes, the results can look “humorous”, but look less and less humorous the more games you play in each category.

Even if you argue that the sub-graphs serve a different purpose than the ratings table, they are built from the same data. Either that data is unreliable and shouldn’t be used, even for the purpose of the sub-graphs, or it’s reliable enough and does no harm being displayed in the ratings table. Otherwise, you would have to explain why the data can be used for plotting ratings-over-time graphs, but cannot be used for sub-ranks that may look humorous at first, but taking deviations into account, are accurate enough (in my experience).

Only if you argue that comparing sub-category ratings is not an existing functionality of the ratings table. It may not be as precise as we might like, but I hope you would agree that saying that it’s not AT ALL a current function of the ratings table is an overstatement.

I wouldn’t mind the thread title being changed as per your suggestion. Case in point, “stretch rating graph” was an accurate title for the other thread for a duration of like four posts… @AdamR?

Ptro · August 20, 2018, 1:24am

Sarah_Lisa:

Honestly, I doubt @anoek himself could offer much more than an interpretation regarding his statement that “the results look humorous”. We can already see in which way the results look “humorous” from your example picture in the first post. Plus, anoek rarely ever replies to forum posts anymore in general, and has refused to discuss any details of how Glicko2 has been implemented on OGS in particular, so I wouldn’t be very hopeful for an in-depth analysis of anything from him.

I would like to add the point that if the sub-ratings look “humorous” when converted to kyu/dan, then that calls into question the entirety of the OGS-Glicko2 rating system. The only difference (supposedly) between how the sub-ratings and overall ratings are calculated is the different “game pools” from which they draw. Given that, experientially, the overall ratings seem fairly reliable and OGS-Glicko2 working more or less as desired, I don’t see why the sub-ratings should be any less reliable (taking the rating deviations into account).

I’m, as I said before, taking GreenAsJade word as gospel. To show my point on this situation, I’ll quote exactly what he said about this:

GreenAsJade:

As I understand it (an interested person like you all, not an expert or someone with the code)…

… the calibration is simply a formula to get from the rating to the rank.

You can think of it as X here: Rank = X divided by Rating

So simplistically we know that 13k = X divided by 1500

so X is about 1500/13 = 115.

It’s not that simple, but that is the idea.

The calibration was done for overall by first:

Applying Glicko rating to all of our games from the beginning of time

Finding X (the calibration) so that the most people end up with a new rank that is closest to their old one from the old system.

The trick is that you can’t properly compare numbers from a different pool of games. What this means is that:

you can’t say 1500 in live 9x9 is the same as 1500 in blitz 19x19

therefore you can’t say 13k live 9x9 compares to 13k blitz 19x19

If we applied the calibration blindly, I might turn out to be 10k live 9x9 and 12 k overall, but actually be worse at live 9x9.

Once again, my understanding (not authoritative) is that the Devs didn’t want this kind of comparison accidentally being made. They don’t want someone saying “I’m 1 dan live 9x9” because they know that this doesn’t mean anything other than you are a better live 9x9 player than an OGS 1k live 9x9 person.

I believe that they fear the risk that OGS would have Dan 13x13 blitz players, drawing ridicule because that is not a valid concept, and could result in people having airs and graces that they don’t “deserve”.

Moving on…

Again, this can’t be confirmed (which actually isn’t that different from GaJ position), so what we have here is supposition x supposition. Which, of course, leads this situation to a impasse.

Well, not exactly. You can use the graph as comparison over time even if the sub-ratings can’t be compared between themselves. Also, I don’t consider anecdotal evidence by itself as valid to ensure that a system is working in a specific way.

Again, it was discussed here that isn’t possible to compare between different sub-ratings. Even if state that you believe otherwise, we will need more proof than your anecdotal evidence that to be sure that what you are talking is valid.

Well, it truly wasn’t necessary to disturb a mod to change that, since I can do this myself. Anyway, since this situation (Glicko over Kyu/dan) doesn’t appears to be totally solved, then let’s wait a bit more until we set these things straight.

MystWalker · August 20, 2018, 2:26am

Okay, this conversation has grown in complexity to the point that I’m unable to keep up with it at this time. I would like to make the suggestion that we move this discussion to a place that is better formatted to handle it.

I have started a Kialo discussion to continue this thread here. I urge all interested parties to add their claims to the appropriate thesis or create there own. You can also vote on the claims that you feel impact the topic the most!

I think this is an important discussion that as many people as possible should be a part of! Please share your thoughts there!

The options I started with are:

Keep the table as it is
Change the Glicko ratings to Kyu/Dan
Change the Kyu/Dan ratings to Glicko
Remove the Rating Table and roll it’s filters into the Graph

Ptro · August 20, 2018, 2:56am

Oh, I loved that site. The UX + information architecture aspect of this site is soooo good. Thanks @MystWalker for introducing me. I’m bit short on time right now, but soon as possible I will add all my information there.

MystWalker · August 20, 2018, 3:03am

Thanks @Ptro! Hopefully people find it useful and we can use it for more discussions!
Voting is now available on each of the claims, so even if you can’t add to the discussion you can still weigh in!

Maharani · August 20, 2018, 3:09am

How about the fact that anoek (a “trustworthy and academic source”) created and implemented the ratings table and has stated its purpose to be “informational”?

Ptro · August 20, 2018, 3:13am

He didn’t specified, as far as a I know, which kind of information we should obtain from it. Therefore, you (and me too) can’t precisely say what is anoek is talking about when he says it’s “for informational purposes”

We can, however, use UX knowledge to say what a user expects from it instead, and how this expectation is being (or not) met.