Rank Instability on OGS

trohde · June 8, 2018, 9:25am

Well, my rank has gone from 9k to (very shortly) 5k, then 6k, then back to 8k … one reason was that a few opponents resigned too early and others lost on time … TBH I’m more worried when my rank goes up “too fast” (for my feeling) than when it goes down.

But what I want to say is this: For me, rank on OGS is not “instable” but rather “fluid”, while, IIRC, on some other server there seemed to be some “rank inertia”, and more in one direction than in the other.

I don’t mind the rank fluidity here on OGS, as I accept that it also reflects the “form of the day”, which can be bad today, and better tomorrow, and bad again the day after.

Lys · June 8, 2018, 9:45am

I get you point and I mostly agree.
@smurph too said that a quick and sensible rating system can be an advantage, instead of being frozen in a never-changing-rank.

That’s fine. I think that too.

But we are still talking philosophy while someone is asking: “are we sure that the rating system isn’t broken?”.
Gaining rank after a loss seems broken.
Increasing deviation after a win seems broken.

I like to have a fluid rank: I feel high when i get 3 kyus in a month and I laugh at that when I lose them two weeks later.
What I don’t like, because I’m stubbornly aristotelic, is broken math.
And uncertainty.

Vsotvep · June 8, 2018, 10:52am

There’s a big difference between ‘‘seems broken math’’ and ‘‘is broken math’’, though.

In simpler ranking systems might be true that if you lose a game your rank cannot increase. But Glicko is based on many more factors. It recalculates your new rank based on your rank before the game, the ranks of all your opponents during the rating period and the deviations of those ranks. So if some of the other opponents you played against before changed rank, your own rank technically should change even without playing any new games. The recalculation happens only after you finish a game, so you don’t see this until you have finished a new game (I believe).

Another thing is that the same win now does not give you the same rank increase as it will on another time. One factor of Glicko is that rank deviation increases if you don’t play enough games, which affects how you’re rated. So if a 5k starts a rating period with a win from a 1d, he will gain less rank (and more deviation) if he played regularly than if he hadn’t played in the last half year.

Finally your rank is not really the number that’s listed on your profile, but rather a 95% certainty range between two ranks, as indicated by the X ± Y (so your rank is between X-2Y and X+2Y with 95% certainty). In a normal distribution the median, the mean and the highest point on this interval coincide, so changing your deviation does not affect your rank. However, in other distributions it can be perfectly consistent that your rank (the midpoint of the interval) shifts if your deviation shifts. (But I’m not sure what distribution Glicko has. If it’s normal, this is irrelevant)

The question is not if the system works as is intended: it does (probably). The question should be whether this system is too volatile or not (and if that’s necessarily a bad thing).

Eugene · June 8, 2018, 11:45am

I think you’re still missing the lysenew’s point.

The point is that we can’t tell the difference between “seems broken math” and “is or is not actually broken math” because the site doesn’t provide us enough actual data about how our rating was calculated.

I think that’s what’s actually being questioned. There is no point in debating whether this system is too volatile when we can’t be sure it isn’t simply broken … especially while there there are indications that maybe it is, and not enough data to prove that it isn’t/

DVbS78rkR7NVe · June 8, 2018, 12:04pm

Unfortunately it’s kind of a cop-out answer, that is “don’t question correctness of ogs god”.

For example, in general chat sometimes the rank of players jumps between two values (with or without refreshing page, very weird). For a long time many people wrote it off as caching problem or as system catching up with recent rank change. However couple of days ago I saw jumping rank of a player that played last ranked game a week ago. Looks a lot more like a bug now and not “system works like that”.

Coding mistakes are possible and excusing it with “a lot factors” doesn’t help. The problem is that we don’t have a lot of ways to check rank changes. Even just a field that would tell you how much your rank changed after playing a game would help. Maybe write a script for that?

Vsotvep · June 8, 2018, 12:20pm

Agreed, I would be able to state things with certainty if I knew how the actual algorithm works. I can only base my things on the Glicko paper, which does allow for the points I provided, but doesn’t give me any insight in how the site works, of course.

However, there is not so much leeway in implementing a Glicko-2 rating system. The only values that are ought to be changed, are the size of the rating periods and the volatility constant. The rest is basically pseudo-code for the algorithm itself. Unless the “OGS god” has a degree in numerical methods and thought the algorithm could be improved, I doubt that the algorithm itself has been changed. (but this is once again a complete guess, based on the idea that developers aren’t evil goblins)

This is very difficult, if not impossible, with Glicko, as it is not know beforehand in which period you finish the game, what the ranks / deviations of the two players are at the end of the game and it depends on the other games played by the other opponents of both players in the same rating period.

DVbS78rkR7NVe · June 8, 2018, 12:24pm

I wrote and meant exactly “how much your rank changed” and not “how much your rank would change”. After the game is finished it’s as simple as comparing two numbers. But it would allow for people who play tons of games see all weird things Glicko does + maybe some kind of common irregularities.

OGS is not very transparent. Try finding out what counts as live and what as blitz. Not trivial at all.

Vsotvep · June 8, 2018, 12:27pm

Actually even that is still dependant on all those factors, so does not really give any insight in if the system works as is supposed.

bugcat · June 8, 2018, 12:27pm

I don’t think corr, live, and blitz are weighted differently, are they? When I talked about blitz I was referring to how it’s both easier to surmount a strength difference and quicker to finish a large amount of games, resulting in a significant compound effect.

Lys · June 8, 2018, 12:34pm

That’s all I can do, too… guessing… but it isn’t very scientific, is it?

I really appreciate everybody’s effort to guess and justify, but my logic mind would prefer solid data.
As an example when are batches of games recalculated? The first of month? When it’s full moon? On a random basis? This could help to check changes.

@GreenAsJade, could you help? You are able to modify the site code, so you should be able to identify the function (i used to call them “subroutines” when I was young) that does this job. Is it possible? Could we look at the code? Perhaps the answers to my doubts are in plain sight and I just don’t know where to look…

Eugene · June 8, 2018, 12:47pm

Well, the thing is that only the “client” side is open source. So I can read and contribute to the way the site looks, and the interface (buttons, where they are etc) but not to the underlying “fundamental way it works” (aka server side code).

How rating works is “fundamental way it works”.

GaJ

flovo · August 19, 2018, 10:46pm

Hi @AdamR

There could be an issue with the way, how OGS calculates the deviation of the players ratings. This results in high fluctuations of the players rank.

I found this, because I wanted to understand Glicko2 in deep and to get a better insight in how OGS implements it.

In the plot below, you can see my rating as shown by OGS in the rating history (upper black line)
and the ratings as calculated by me (upper green line).
The lines on the bottom of the graph are their deviations.

The red lines are a calculation with my implementation, but forcing the deviation to be the same as OGS ones.

As you can see, the green line is much more stable than the OGS rating, while the red line follows OGS, while the deviation of the OGS data is much higher (about 2 times, didn’t calculate the exact number).

I hope this helps to pin down the maybe issue with the unstable ranks. Sorry for the necro.

Implementation details:

I implemented Glicko2 as described in the Glicko paper http://www.glicko.net/glicko/glicko2.pdf. I tested my implementation against this one https://github.com/sublee/glicko2 and found no errors.
All OGS-data (player ratings, opponent ratings, game results) I used are the ones in https://online-go.com/termination-api/player/449941/rating-history.

If you need more details, just ask. If you need them, I can also send you my python scripts, too.

Maharani · August 19, 2018, 10:49pm

^ Fascinating! Thank you for your efforts!

smurph · August 19, 2018, 11:13pm

Can you illustrate this in (pseudo-)code for the two formulae? Because “but forcing the deviation to be the same as OGS ones.” does not explain a great deal.

flovo · August 19, 2018, 11:45pm

Sorry for that.

I’m iterating over the game history, early -> latest. Finalizing a rating period every 15 games. i is the index of this iteration.
OGS_deviation is the deviation as listed in https://online-go.com/termination-api/player/449941/rating-history
The sliced arrays ([a:b]) include the elements with index a to including b.

For the correct calculation (only line 4 differs)

begin_rating_period = i - i mod 15
// obtain the last finalized values
last_finalized_rating = my_calculated_ratings[begin_rating_period - 1]
last_finalized_deviation = my_calculated_deviation[begin_rating_period - 1]
last_finalized_sigma = my_calculated_sigma[begin_rating_period - 1]
// calculate the new rating
my_calculated_ratings[i], my_calculated_deviation[i], my_calculated_sigma[i] = 
     Glicko.rate(last_finalized_..., OGS_opponants_rating[begin_rating_period:i], OGS_opponants_deviation[begin_rating_period:i])

For the “forced” deviation (only line 4 differs)

begin_rating_period = i - i mod 15
// obtain the last finalized values
last_finalized_rating = my_calculated_ratings[begin_rating_period - 1]
last_finalized_deviation = OGS_deviation[begin_rating_period]
last_finalized_sigma = my_calculated_sigma[begin_rating_period - 1]
// calculate the new rating
my_calculated_ratings[i], my_calculated_deviation[i], my_calculated_sigma[i] = 
     Glicko.rate(last_finalized_..., OGS_opponants_rating[begin_rating_period:i], OGS_opponants_deviation[begin_rating_period:i])

The plot shows the calculated values after each step.

Vsotvep · August 20, 2018, 4:19am

You beat me to it, I was planning to do exactly the same after I’m back from holiday in a few days.

DVbS78rkR7NVe · August 20, 2018, 7:28am

For super dumb people. How can one recalculate ratings deviations at all if you need to know ratings/calculations of several (~15 players, I think?) last opponents at the time game ends? rating-history seems to give only numbers of the last opponent.

flovo · August 21, 2018, 2:23pm

That’s a rather good question.

I used the opponent ratings of the previous games (rows below). That was the easiest way and the results are good enough to start with.

Following your suggestion, I modified my script to use the “current” opponent ratings by looking at the rating history of each opponent.
The new plot is indistinguishable from the old one. Therefor I’ll leave it as it is.

It is even worse. The rating is the rating after the game ended. It is calculated including the result of the listed game.

AdamR · August 26, 2018, 9:30am

Hello

very interesting and my deepest thanks for all the work put into this.
Unfortunately this is way above my understanding of the system, so I will be unable to discuss, let alone change anything but I will relay this onto our devs.

They will probably be unhappy about trying to tinker with the system as any deep changes are surely dangerous and (as far as we are concerned) the system works “well enough”, but if there really is some underlying issue, I am sure they will eventually try to fix it, and maybe contact you for more details
Thanks again.

flovo · August 26, 2018, 9:50pm

Thanks for your reply.

I can understand them. Changes on the rating code would probably change the rating distribution, but

“well enough” is the right choice of word I guess I wouldn’t use “well”, but “well enough” is quiet fitting.
It’s at the borderline to ill behalf, but on the side of well.
My rank is “stable” between 12k and 18k. On the ladders I’ve to look if the person I want to challenge is on the lower or the upper end of their 5k interval.

Therefor I would appreciate if the ranks were more stable (behave well so to speak), but I can understand if the devs decide that changing/fixing this would be not worth the time.
I can work around the instability/fluidity of the ranks. It only get sometimes annoying.

Thank you for your good work.