Show Kyu/Dan instead of Glicko Rating on Player Profile

And

  1. To track your individual progress in each pool.
1 Like

I have issues with this line of reasoning, not because I think it’s wrong, but I think it’s misleading to assume that the percentile system is better than the other systems at the comparisons for number 2 (compare [a player’s] strength between different board sizes / speeds).
If all the smaller pool ratings were in the same units it’s easy to make comparisons. 8 cows in 9x9 is better than 4 cows in 19x19. I don’t understand why comparing percentiles is better. Granted, you would know how many people you are better than on the server, but that doesn’t address point number 2.

Is there something I’m missing?

I think so.

I think you are missing the fact that saying that I’m “the best” at 9x9 blitz is equal in some sense to saying I’m “the best” at 19x19 live, and equal in that same sense to saying I’m “the best” Overall.

In each case, I am the best at something, in a given pool. Sure its understood that its easier to be best in a small pool (or arguably it might be) but nonetheless “best” is a transferable concept.

Similarly “I am median” is comparable. “I am median overall, but I’m the best 9x9 blitz player” says something meaningful and understandable about your playing ability relative to others.

That is why percentiles is a candidate solution, targetting those who are trying to derive some comparative meaning from the table.

Okay, this is where I’m getting frustrated.
We’re conflating comparing the ranks of an individual’s differing pools, and comparing one person’s rank in one pool with the rank of another person in the same pool.
I’m trying to make it clear that these are two separate use cases, but they keep getting mixed up.

Case A:
Segue is a user on OGS.
As a user on OGS Segue wants to see if they are better when playing Live 9x9 or Live 19x19.

Case B:
Jackie and Brovien are users on OGS.
As a user on OGS Jackie wants to see if they are better than Brovien in the category of correspondence 19x19.

I don’t see how percentiles are better in either of these cases. Maybe someone could make a different example case?

I think these are valid observations that one could make. However, the question of “why do I perform better in the 9x9 blitz pool?” is still ambiguous.

Is it because “I am stronger at 9x9 blitz than other game types” or is it “the 9x9 blitz pool is mostly made up of weaker players” (which could be a possibility if a lot of novices are looking for quick games, but stronger players disdain that game type).

3 Likes

What I want to now is:
“Is my performance in 13x13, what one would expect for a player of my overall rank or am I worse on 13x13”

Since the average strength of players in each pool is different, I would expect something like “9x9 65%, 13x13 55%, 19x19 45%” even if I’m performing equally good on each board size.

To get my point clear, this is how many players of each overall rank are in each pool (thanks S_Alexander ):

1 Like

What if most of the beginners are only playing blitz 9x9, while being too intimidated to even try live 19x19 games?

If you found someone that was a median player overall, but happened to be one of the best blitz 9x9 players on the site, I would think that the above hypothesis would be more likely to have played a factor in this observation.

1 Like

I agree that this is a potential complexity in interpreting comparison of percentile across pools.

I was just trying to show why comparing percentile (which is a transferable measurement) is more legitimate than comparing rating from pool to pool.

I’m not seeing this. WHY is it more legitimate?

You could say the same in Glicko or Kyu/Dan. “I am a 12kyu overall, but I am a 9Dan 9x9 blitz player.” The only thing lost in translation is the position in the pecking-order of the whole OGS community. This is more information, it is not necessarily better or more legitimate information. Frankly, I’m not crazy about having a top-down ladder for the whole site at all.

If we want to add this functionality to the table, fine, but it is not information exclusive to the percentile suggestion. We could have the median (or average, or mean, or mode, you statistics nerd :wink:) rank displayed for each group. I’m not saying that’s a great idea, but it’s something we could also do.

Because “best” and “median” are terms that have a universal meaning.

Wheras “1500 points” does not.

It’s granted that just because I am “best” in my club doesn’t mean I’m “best” in the world: best is relative to the pool it talks about, but at least the meaning is clear.

If I tell you “I am best in 9x9 blitz but median in overall” we can have a conversation about what that means. It might mean that I studied hard at 9x9 and think fast, or it might mean that all the beginners are in 9x9 blitz. But for sure it means that I’m the best at 9x9 Blitz, but not the best overall. At least we understand the meaning and the comparison that is being made.

If I tell you “I am 1500+/-10 in 9x9 blitz and 1500+/-10 overall” the only thing it means is that I am pretty certainly 13k Overall. It tells you nothing at all about 9x9 blitz.

GaJ

Kyu/dan ranks tell us absolutely nothing. They can tell us something if we have particular opponent to compare with. The only reason we’re comfortable with rank is our experience with the system. We sorta know how hard each rank is. But when it comes to categories it’s harder because we hardly ever see categorical ranks except for that table. And when we do see it in the table most of us have mo idea what numbers could mean. In case of percentile at least we get info about our position in this specific pool. We still can’t really compare ourselves between the pools but it’s better than rank.

And we need to remember that percentiles usually come in with a whole graph. And graphs are very cool and pretty. I like looking at graphs.

3 Likes

Check lichess out.


https://lichess.org/stat/rating/distribution/classical

3 Likes

I finally had enough time to participate in the discussion again, so allow me to get back from where I let off.

First, thank you all for keeping the discussion going. I was afraid that it would lose momentum in that meantime, really glad to see that I was wrong.

Before getting in specifics, let me address what I think, in general, about this topic:

@Farraway suggestion is a pretty good one, actually. It, in fact, does solve the problem at hand in a alternative way. Now, it actually does have some problems too. But if Farraway alternative is implemented and the existing problems within solved, then I would very much welcome that instead of the present table. Of course, I still personally think that integrating the table with the graph would be a more interesting one, but both of them have theirs pro and cons.


Now getting into the specifics,

I would be extremely against it. When you provide the expected outcome from a match to a player you are setting yourself for trouble. Reasons for it:

  • If a match-up isn’t exactly 50/50 then the players would always blame the matchmaking system for creating such “unfair matches” (See a forum for any multiplayer game which displays the matchmaking information for instance)

  • Players would generally avoid playing in matches when they don’t have a clear lead that they may win, which is would lead into a situation that everyone always want to “just win” and never lose and it would be bad for the community in general.

(For context purposes, here Farraway was talking about how kyu/dan can’t be used as a accurate measurement between different rating pools.)

Yes, you are right actually. For me this is, without a doubt, the highlight in your proposal versus a kyu/dan one. Now, we also have to consider that kyu/dan was used before in that same situation (and please note that right now I not objecting what you said). It leaves me very curious to understand how the previous implementation approached this type of situation, and which was the devs point of view about this before changing to Glicko.

This is a problem with the percentile proposal as it is. It must make it very clear that is taking site-wide information and not only from the overall rank with some deviation. Or if is doing the opposite, then it should make this very clear either. It may even turns out to be easy fixable, but it shows how using percentile could lead users into some wrong interpretations if not correctly implemented.

Flovo interpretation of how a percentile-based display of subratings should work is also a much more useful one (even if it is way more complex, and maybe not even possible, implementation). From what I understood, would be something like this:


(explaining the image for the Overwatch-illiterate ones, this is a unofficial website that uses data provided from the official Overwatch API to analyze a specific player performance [this isn’t my data by the way] using a deep network to interpret and see where you should improve on. The points on the curve refers to some aspects that are both special attacks and gameplay in general oriented)

and then each point would refer to different board/time sub-rating.

That is my main concern with @Farraway proposal. I also do agree that this situation is very likely to occur and would be misleading to the users. Creating a warning to explain this situation to a end user even inside a infobox in OGS would not be very simple either, since the wording of it may even make more questions pop up on the user head then without it, which isn’t exactly desired from both a UX and UI design point of view.

Not necessarily. See Master Overwatch approach for instance. It also uses a percentile to display numerous stats about the player performance within a specific character, without even using a single graph. Besides, we already have a line graph in the player profile, adding more one graph to display a different type of information may add even more confusion.


Also, just reminding for anyone just joining in, or that missed the link, that we are also using this kialo link for the discussion. Everyone is welcome to join in, and if anyone has a problem when trying to use it you can post here and/or in the kialo chat that this also avaliable in the link above so that we can help you with that. Of course, just posting in this thread is also much appreciated.

3 Likes

Using the current matchmaking system to get an even 19x19 game can lead to a disappointing match, as I pointed out in an earlier post. I can be matched with an opponent with the same overall rank as me, but they may not be as good at 19x19 as they are at 9x9. This is especially true at the beginner level. In one case I was matched with an opponent who had never played 19x19 before.

Does everyone really want to “just win”? I often like to play games with an opponent who is slightly better than me, where I am more likely to lose than win.

2 Likes

In that specific case, I’m pretty sure that there isn’t a single matchmaking system that could take this situation in consideration. As you same said, your opponent hadn’t played any 19x19 games, so it would not have sufficient data data anyway.

You just ignored what I was referring to. I said that (as even you quote shows) if you show the expected outcome to a player then you are stimulating the player to only engage in matches where his win is expected by a good margin.

How is it different from current system? All players more or less know whether they’re expected to win or not. We all know that if you take on 5 ranks weaker player, you’re expected to win, if you play 3 ranks weaker, still a good chance to win, etc. Yet a lot of people like and do play even games or games against stronger opponents. On KGS you even can get marked for playing only stronger (while no one cares if you whoop some volunteering weaklings).

Edit: KGS mark explanation for OGS-only people KGS: The Tilde

Know more or less is very different from explicit showing the expected outcome of a match.

EDIT: Whoops, misunderstood your KGS example, removing it

Here’s an idea. The problem at the moment is that we can’t compare different pools of players. How about making a big table comparing glicko ratings for each category (like https://senseis.xmp.net/?RankWorldwideComparison). Now we know that for example, 2000 at 19x19 is equal on average 1500 9x9 and is equal 2000 (~3k) overall, on average. So we slap 3k label at all of them. And now when some players have 1500 in 9x9, we show 3k, and it means that these players play on 9x9 like we would expect 3k overall play. While their overall rank maybe 7k, so they’re playing better than expected at 9x9.

Does that make sense for anyone?

Correct. The point I was making was that 19x19 and 9x9 ranks may differ, especially at the beginner level. In these cases using the 19x19 rank would lead to a better match than using the overall rank.

Um, side note: your quote regarding Ptro didn’t show up as a quote; you used [quote="Ptro’], interchanging between the " and the ’ symbols.