Rating anomaly? Is this a bug?

Kosh · May 5, 2018, 10:26am

My latest opponent has only ever played 19x19 but his overall ratings are not the same as his 19x19 ratings.

is this a bug?

For reference player is: https://online-go.com/player/433448/

Eugene · May 5, 2018, 10:43am

Isn’t it the case that the overall rating calculation is done using your opponents’ “overall” rating, wheras the 19x rating is done using your opponents’ 19x rating.

So this tells us that doctorx’s opponents had differerent overall ratings to their 19x ratings, which is pretty normal.

Only of doctorx and his opponents all only played 19x would these numbers be the same…

AIUI.

GaJ

smurph · May 5, 2018, 2:01pm

Um… ?

Eugene · May 6, 2018, 12:48am

I think you misunderstood the question.

In Kosh’s case he has played both 19x and 9x games, so it’s obvious that his overall rating is different to his 19x and 9x ratings

The person he pointed to has only played 19x games, so the question is “why is it that your overall rating is different to your 19x rating if you only play 19x games?”

The answer is “because your overall rating is calculated using your opponent’s overall ratings, and their overal rating is not the same as _their _19x rating”.

smurph · May 6, 2018, 12:50am

Thanks for pointing that out, I was indeed unable to extract that information from the post.

Eugene · May 6, 2018, 12:51am

smurph · May 6, 2018, 12:58am

Well now you misunderstand my post; my point was that even his overall 19x19 was very different from his specific 19x19 ratings so I didn’t see why he would be surprised that overall overall rating also didn’t fit the bill. But I can also see why you wouldn’t have been able to extract that information from my post.

Kosh · May 6, 2018, 4:34am

I guess the point I am trying to make is that the Ratings Chart isn’t comparing apples with apples. If these are all Elo rankings that use the same base value and settings, shouldn’t they be comparable? ie on the same scale.

@smurph is right to point out that my own chart shows something similar. That’s where my suspicions began but as @GreenAsJade points out, I have played both 9x9 & 19x19. Very little 19x19 in fact (12 ranked). I thought doctorx’s chart was better at demonstrating what I was seeing as it eliminates some of the variables involved.

Here’s a very different kind of example:

This player plays LOTS of games so the +/- is lower for all rankings and yet Overall ranking is 199 higher than the 9x9 ranking, 475 higher than the 13x13 ranking and 219 higher than the 19x19 ranking. The overall ranking is massively higher than the three components that it should be made up from.

So to return to my metaphor; which of these numbers are apples, which are oranges and are any of them lemons? For those who don’t like metaphors; what can be legitimately compared with what?

My suspicion that something is amiss remains but I lack the skill and knowledge to prove it or even articulate it.

ps.@smurph I never said I was ‘surprised’. I don’t see how you extracted that from my post

smurph · May 6, 2018, 4:47am

Well, when you experience something you did not expect, that’s a surprise. You expected similar numbers, but found “very different numbers”, so you must have been surprised.

That, and the green and red all over the picture.

Kosh · May 6, 2018, 5:07am

I play Go on a regular basis so I frequently experience the unexpected. Especially against dan players. In fact, I would only be surprised if I did not experience the unexpected.

BHydden · May 6, 2018, 5:17am

My understanding is that the different categories are not designed to be compared with each other, as they are all on different “systems” if you will, much like comparing your tygem rank to your kgs rank.

What you CAN compare is your “insert any specific rank here” with any other player’s “insert the same specific rank here” to see how each of you has performed in that specific play style.
(hope that made sense? basically trying to say you can’t compare your own ranks with each other but you can compare any one rank with the same category of another player)

Eugene · May 6, 2018, 8:35am

BHyden nailed it.

The 19x rating is from a “different pool” than the Overall rating.

As we see here: you can’t compare them.

smurph · May 11, 2018, 4:10am

For the fun of it, I compared all rating points before and after my last correspondence 9x9 (which I won):

Before:

Screenshot_20180511_054426

After:

Screenshot_20180511_060055

You can see,…
a) increased overall/overall rating (+4)
b) increased overall/correspondence rating (+137) and deviation (+85)
c) increased 9x9/overall rating (+41) and deviation (+2)
d) decreased 9x9/correspondence rating (-42) and increased deviation (+52)

I have to agree with Kosh that it’s pretty weird to see an increase in overall correspondence rating (my only correspondence games have been 9x9) and a drop in correspondence 9x9 rating.

Kosh · May 11, 2018, 7:31am

As you say, the right-hand column is particularly odd.

While the overall and 9x9 figures are compared to different pools, they should still move in the same direction, if not by the same amount.

Also odd that the uncertainty on your overall/correspondence went up dramatically even though you defeated a lower ranked opponent.

BHydden · May 11, 2018, 8:35am

A lot of factors go into how each number changes.

Rating and confidence of yourself, your opponent and everyone either of you have played in the last 15 games / month and everyone they’ve played, etc.

It is all very fluid and it’s important to know that until a full period (15 games or a month) resolves, everything is just an estimate.

Lys · May 11, 2018, 12:44pm

This is a recurring answer.
Sometimes things happen that look very strange (as an example: I assume that a win against weaker or a loss against a stronger opponent should reduce the uncertainty or confidence or deviation or whatever it’s called).
The fact that the algorithm is complicated doesn’t hearten much.

Many users are complaining or doubtful about rating, but we obviously can see a very little part of the ingredients: our games, our rank, the changes after a game…
I believe that our devs could share more informations and get a big reward from it, at least in terms of happier users. Many of us would be very happy to participate as “testers” and this could be win-win: if everything is ok, we’ll be reassured, and if we find out that something was wrong, they could fix it.

We do this every day about everithing else on OGS, but rating is a big black box.