Show Kyu/Dan instead of Glicko Rating on Player Profile

opuss · August 17, 2018, 5:02pm

If GreeAsJade is correct, and the live 19x19 rating just uses a different pool of games, then playing an opponent with a similar live 19x19 rating as mine should lead to an even live 19x19 game.

It should also provide a better indication of progress in live 19x19 games than my overall rating. This is why smurph chose to use it as a measure in his experiment.

The problem is that I can’t easily get a game with a player with similar live 19x19 rating to mine. I have to create a custom game with an overall rank restriction.

Eugene · August 18, 2018, 1:05am

This (alone) is not enough. This only gets you opponents who’s overall ranking matches your live ranking.

So then you have to look in their profile when they accept, and cancel the game if they don’t match your live ranking.

Unless flovo was mistaken somehow, and actually our sub-ranking is used for game matching.

It really would be good if @anoek or @matburt could clarify this point.

GaJ

Ptro · August 18, 2018, 7:49pm

I don’t think this is true. From what I understood from anoek quote, it seems that isn’t a matter of number of played games, it’s a way more complex issue than that. Your 1500 19x19 live rating isn’t necessarily equal to another player 1500 19x19 live rating. The only thing that really matters, and it’s truly possible to use as a means of comparison is the overall rating.

I think you answered to @opuss instead of me, since he was the one who said that. Anyway, I agree with you, the sub ratings aren’t really used (and actually can’t be used) for matchmaking. Again, it’s really needed some changes in that area, this type of confusion shouldn’t even be happening in the first place.

opuss · August 18, 2018, 10:18pm

Exactly! I also have to look at the 19x19 rating.

opuss · August 18, 2018, 10:23pm

This is where we really need better documentation.

As I understand it, the 19x19 live rating can be used as a comparison between players. What you can’t do is make an accurate comparison between different ratings.

The overall rating can give a bad indication of a player’s 19x19 ability.

Eugene · August 18, 2018, 11:20pm

What gives you this idea? I don’t think that’s true. If if were true, then your overall 1500 rating would not be equal to another player’s overall 1500 rating.

The only difference between the sub-rating is the pool of games that they come from.

The reason we don’t convert this to rank is because we don’t have a calibration for that.

The calibration from rating to rank is designed to give us ranks that match our old ranks best, and also to match other sites as best as possible.

That calibration is done for the overall pool. It would be different for sub-types. Even if we had data to calibrate sub-types, which we don’t.

That is the reason why sub-type rating is not converted to rank: because it can’t be calibrated.

Personally I would argue “just use the overall rating calibration - it’d be close enough”.

What we know is that right now anoek and mattburt don’t agree with that.

Correct (as I understand it )

By the way I reached out to anoek, and he said:

“What is used in player matching for auto matches”

“[1:39] anoek: Overall rank”

GaJ.

Ptro · August 18, 2018, 11:38pm

No, I’m not talking about the overall rating, just about the sub-ratings. Also, that’s what I could understand from this quotes:

To me, they say that since the sub-ratings come from different scales and different ratings pools, they are only related to the overall rating and not on a global scale (19x19 live from one user to 19x19 live from another user), and since the data which each user obtain its ratings is different, each users sub-ratings true meaning is different.

Simplifying, when reading the information above I understood that since each user has a different history, the way that each sub rating meant is different (think relative on a player data instead of absolute all players 19x19 live data).

=============================================================================

Oh, by the way @GreenAsJade, I wrote that before you edited. I am just gonna post that anyway to inform my thought at time. Right now, I think I agree with you interpretation about that, just really would like some sort of final clarification about this from the devs.

MystWalker · August 18, 2018, 11:43pm

Took a while to catch up on the discussion.

I probably haven’t been on this site as long as any of you, as such I probably won’t have as nuanced a perspective. That said, I have been here a while and this is the first time I’ve ever heard of the Glicko system. I know all about the kyu/dan system, since it has (I feel) an inseparable connection to the game and it’s handicap system.
If we want to keep the user-visible Glicko scores in their prominent position (and it sounds like we may not) there needs to some easy to access explanation for it. As I mentioned, I’ve never heard of the Glicko system, and based on only the numbers I would have had nothing to search for in Wikipedia or the Internet in general.

I’d like to know why it’s not possible to have useful user ranks for matchmaking in the smaller game pools. It seems like that could be a very useful set of data. It would be great to be able to match myself with someone of an equal strength on that board size easily. Maybe that could be a matchmaking option, and general score would be the default.
Speaking as someone who is stronger in 9x9 than larger boards, it would be really nice to be able to match automatically with someone my speed on that board size instead of getting beat down enough that my total rank falls to compensate.

In brief: I’d vote for the Glicko numbers to go, and have them replaced with kyu/dan numbers if possible.
My questions are as follows:

Why are the smaller pool ranks not shown in kyu/dan numbers?
Why can’t we use the smaller pool numbers for matchmaking?
If we can’t use them for matchmaking, can we make them more viable for comparison between players somehow?
If the smaller pool scores aren’t usable for matchmaking or comparison, why have them at all?

Are there technical/mathmatical roadblocks?

Edit: GreenAsJade’s last response was not out when I started writing.

So the main hurdle to having kyu/dan numbers is this calibration? What does this calibration do exactly? Why is not having it for the smaller pools a deal-breaker?

Eugene · August 19, 2018, 1:21am

As I understand it (an interested person like you all, not an expert or someone with the code)…

… the calibration is simply a formula to get from the rating to the rank.

You can think of it as X here: Rank = X divided by Rating

So simplistically we know that 13k = X divided by 1500

so X is about 1500/13 = 115.

It’s not that simple, but that is the idea.

The calibration was done for overall by first:

Applying Glicko rating to all of our games from the beginning of time
Finding X (the calibration) so that the most people end up with a new rank that is closest to their old one from the old system.

The trick is that you can’t properly compare numbers from a different pool of games. What this means is that:

you can’t say 1500 in live 9x9 is the same as 1500 in blitz 19x19
therefore you can’t say 13k live 9x9 compares to 13k blitz 19x19

If we applied the calibration blindly, I might turn out to be 10k live 9x9 and 12 k overall, but actually be worse at live 9x9.

Once again, my understanding (not authoritative) is that the Devs didn’t want this kind of comparison accidentally being made. They don’t want someone saying “I’m 1 dan live 9x9” because they know that this doesn’t mean anything other than you are a better live 9x9 player than an OGS 1k live 9x9 person.

I believe that they fear the risk that OGS would have Dan 13x13 blitz players, drawing ridicule because that is not a valid concept, and could result in people having airs and graces that they don’t “deserve”.

I believe they made the number table because the data is available and it is interesting for each of us to be able to know how our performance in the sub-types goes over time.

This statement back’s up @Ptro 's assertion that there shouldn’t be a table, there should just be separate graphs available for each sub-type.

I can’t think of any reason why the sub-type ratings should not be used for game matching. It seems like doing so would improve skill match in games…

GaJ

Ptro · August 19, 2018, 2:02am

Me neither. When you get this information from the devs, can please tell us why? We’re all really confused about the specifics of this situation.

Kosh · August 19, 2018, 3:13am

I can think of one reason. While the rating for a particular group might be more targeted for matching purposes, the ± element is often larger so the overall ranking might still be a better bet simply because it has more data put into it.

To use myself as an example. I don’t want the unfair advantage that would apply in a live19x19 match up. I would rather have the small disadvantage from using overall rankings:

Ptro · August 19, 2018, 4:04am

Yes, but as you said yourself, as soon as you play more it’s expected that this advantage disappear, so don’t I seem that as a actually good reason for not having a more precise matchmaking.

Eugene · August 19, 2018, 4:13am

…(big deletion)

I nearly fell into the trap of comparing 1394 with 1962 and concluding that Kosh is weaker at 19x19 than 9x9.

This is exactly the trap that we are not supposed to fall into.

It seems to back up the assertion that this table causes more trouble than it’s worth

I think it would cause this trouble - maybe even more so - if it were in Kyu/Dan.

Therefore I disagree with the premise of this thread, but continue to agree that the table should be removed.

GaJ

Ptro · August 19, 2018, 4:25am

Yeah, about this. Considering the information discussed within this topic, I was thinking that it’s probably a good idea to change the thread name into “Remove the player ratings table, but keep the sub-graphs functionality”. Any objections?

Maharani · August 19, 2018, 5:09am

Yeah, me. I use the board size overall ratings to figure out which one Im weakest at and which ones I should play more to improve.

This whole thread feels like its about a whole bunch of nothing to me. Its almost like complaining that the shades of red in the graphs should be reversed.

I dont understand how ELO would be any harder to understand than kyu/dan even to people already familiar with the latter. Its incredibly simple: Higher value = better, lower = worse. The bigger the difference in rating, the bigger the difference in strength.

I dont feel like the imperial/metric analogy really works, either. Sure, the difference between 2k and 3k is that a 2k can give the 3k one handicap stone in a 19 x 19 game and expect a 50 % chance of winning, and ELO cannot readily convey this. However, not very many people play handicap games to begin with, and “one stone per rank” only really holds up at small rank differences. When you are a 15k, can you really tell any meaningful difference between playing a 4k and a 7d?

What Im saying is that I feel like the kyu/dan system does not hold a lot more information than ELO even to people unfamiliar with ELO. All that really matters in every day go is whether someone is better or worse than you, and whether they are so by a small or big margin. Both those things can be readily understood from ELO ratings after three or four games. Whats more, they dont have the confusing distinction of kyu, amateur dan and pro dan, with kyu ranks going in a different direction than amateur dan or pro dan ranks…

I wouldnt mind the ELO ratings in the ratings table converted to kyu/dan (I feel like the separate category ratings are comparable enough despite drawing from different “game pools”), but I strongly object to getting rid of the table altogether just because ELO is supposedly super-unintuitive. I feel like that is a lazy argument, and it would prevent me from figuring out which board size I want to be improving at.

Eugene · August 19, 2018, 5:42am

Here’s the thing.

Ostensibly (IE people who supposedly know the maths) tell us that you can’t compare between the ratings.

So you can’t use the table to determine which “type” you are weaker at.

I don’t know the maths, nor how real this is - it’s just The Argument.

If it’s true, then the above reason for having the table is misguided.

GaJ

Ptro · August 19, 2018, 5:45am

I’m not sure if you are citing an existing discussion in this forums or just supposing one, but is some cases it’s extremely necessary to swap currently used colors to improve the UX. Accessibility and psychology of colors, for instance, dictates a lot of which colors should or shouldn’t be used in certain situations.

Also, if it feels like a bunch of nothing, them you clearly have not put enough time reading it. The objective of this thread is to improve the UX for all OGS users, various points have been raised by many users about how things are currently working and how they should ideally work.

See, it’s such a irony. You’re stating how simple to understand it is, but you are deceived and wrong in your affirmation. In OGS, which uses Glicko (mind you, not ELO, which is a completely different rating system) lower is actually better. Again, this just proves that you didn’t put enough time reading the thread to contribute to that, cause if you did you would notice that just 2 posts above yours GreenAsJade fell into the same trap.

You just completely missed my point. What you are talking about has really nothing to do with what I was talking about. To begin with, handicap games were never even in discussion. My point was about what the user expects x what is shown to him.

No, it really does matter. Again, this point was already discussed.

Again, not true. You clearly haven’t read what was previously discussed. Many reasons have been put in discussion to show why changes in the table are needed. Pretty much all of them using very trustworthy and academic sources.

Maharani · August 19, 2018, 5:47am

@Kosh and @GreenAsJade I doubt that the numbers are so far off that general comparisons are impossible… If I lose a 13 x 13 game, my 13 x 13 rating goes down, etc. As long as the uncertainties are similar, I have no experience-based reason for a deep distrust in the ratings table…

In fact, I agree with the earlier suggestion that individual category ranks should be used for matchmaking, if they arent already (I do distrust anoek’s various statements on this since he hasn’t addressed the evidence for this).

Maharani · August 19, 2018, 6:04am

The former. It was a tongue-in-cheek remark, since I’m the one asking for the colour swap, whilst acknowledging that it is a purely cosmetic issue, as I feel is the use of Glicko ratings instead of kyu/dan in a few spots.

I’ve read the entire thread. I simply disagree that the use of Glicko numbers in a few spots is more than a cosmetic issue.

I was under the false impression that while the rating system is called Glicko2, the rating as a number was still referred to as ELO (i. e. “under Glicko2, my rating is ELO 1500”). Apparently, this is not the case and I apologize for misusing the term ELO in my post above.

Even still, a higher Glicko2 rating is better than a lower Glicko2 rating. I have no idea where you got the opposite impression and you are simply wrong on this point. When I started playing on OGS, my Glicko2 rating was roughly 620, which is equivalent to about 40k (although the site displays everything “below” 24k as 25k, which is a UX issue that bothers me a lot more). My current Glicko2 rating is 1507, which is equivalent to 13k. Professional players have Glicko2 ratings above 2000. You can easily prove this to yourself by comparing the Glicko2 and kyu/dan ranks in the picture in your first post…

I took your point to be that displaying a kyu/dan rating was preferable to Glicko2 for reasons beyond “most players are used to kyu/dan”, and that you thought that kyu/dan was inherently more useful than Glicko2 for practical reasons. I brought up handicap stone differences because that is not just the only practical purpose of comparing kyu/dan ranks with each other, but because it is in fact the parameter that DEFINES the kyu/dan ranking system. I then proceeded to explain that despite this, I don’t feel like kyu/dan is inherently more practical than Glicko2 since almost no one plays handicap games anyway and this principle falls apart at massive differences in rank.

No comment

Maharani · August 19, 2018, 6:13am

Side note:

In the userscript demonstrated in the first post, 1359 +/- 82 is translated to 15.3k +/- 1.8.

From what I understand, a deviation of 82 Glicko2 points should translate to a different deviation in stones in the + direction than it does in the - direction since kyu/dan ratings progress logarithmically, correct? 82 Glicko2 points should be less stones in the + direction (e. g. 1.7) than in the - direction (e. g. 2.0). Probably not to a degree that it really matters, I’m just curious.