I understand why people are annoyed. Even if (quite possibly) the higher rank fluctuation is a more accurate model of what actually happens (a lot of fluctuation, that’s what), it is annoying. It’s annoying that I can’t say “I’m x dan” when that rank can drop 2 steps due to a single sdk botter who manages to snag a game - or from playing games when I’m tired and/or tilted.
My experience from Dota 2 is fairly similar - many people lose hundreds of elo points “just to make up for that one loss that should have been a win”. Luckily on OGS it only takes those few wins to make up for the huge deficits that come from equally few losses.
Other rating systems have other problems - ranking up on IGS takes forever (about 15-20 straight wins depending on your rank), similar numbers for the Tygem/wBaduk/Fox cluster, on KGS your rank will slowly crystallize into a never-changing number, but you can also increase your (uncrystallized) rank by losing to a much stronger player in an even game.
We’re all complaining about rating points we lost, but in the end, the fact that we have a rating system that allows us to rank up quickly as soon as we stop being as terrible at the game as we have been until now, sit down and actually work for it, I think that’s an advantage.
I could obviously be my fault, not necessarily a bug.
Also @mekriff could be right: it could just be a coincidence that the rating system was recalculating the batch of 15 games precisely while I was losing that single game. It could happen.
But because of the already mentioned lack of informations about these calculations, we are all just making suppositions. I still really don’t understand why our beloved devs are so shy about this topic.
Let me quote myself:
Please, forgive me, but when I hear something like “I think it has to do with…” or “A lot of factors go into how each number changes”, it sounds to me like this:
I must say, my rank has been much more unstable since implementation of the new rating scale - wildly swinging between 5k and 10k. I am now on around 8k, which is probably about right, but I don’t expect it to stay there for long. I think getting points for beating lower ranked players is a bit weird, as is losing points for losing against stronger players, but I realise this is due to uncertainty and instability in the other players ranks. I get the feeling this system is in some kind of wild oscillation at the moment, and doesn’t have any damping features on it. Perhaps it is just because I am used to playing on other servers where rank doesn’t move so quickly. There are definitely also problems with combining score on all board sizes and all time limits. I wonder if making the number of games over which the rank is defined larger would help to stabilize things (like 30 games rather than 15)?
Well, my rank has gone from 9k to (very shortly) 5k, then 6k, then back to 8k … one reason was that a few opponents resigned too early and others lost on time … TBH I’m more worried when my rank goes up “too fast” (for my feeling) than when it goes down.
But what I want to say is this: For me, rank on OGS is not “instable” but rather “fluid”, while, IIRC, on some other server there seemed to be some “rank inertia”, and more in one direction than in the other.
I don’t mind the rank fluidity here on OGS, as I accept that it also reflects the “form of the day”, which can be bad today, and better tomorrow, and bad again the day after.
I get you point and I mostly agree. @smurph too said that a quick and sensible rating system can be an advantage, instead of being frozen in a never-changing-rank.
That’s fine. I think that too.
But we are still talking philosophy while someone is asking: “are we sure that the rating system isn’t broken?”.
Gaining rank after a loss seems broken.
Increasing deviation after a win seems broken.
I like to have a fluid rank: I feel high when i get 3 kyus in a month and I laugh at that when I lose them two weeks later.
What I don’t like, because I’m stubbornly aristotelic, is broken math.
There’s a big difference between ‘‘seems broken math’’ and ‘‘is broken math’’, though.
In simpler ranking systems might be true that if you lose a game your rank cannot increase. But Glicko is based on many more factors. It recalculates your new rank based on your rank before the game, the ranks of all your opponents during the rating period and the deviations of those ranks. So if some of the other opponents you played against before changed rank, your own rank technically should change even without playing any new games. The recalculation happens only after you finish a game, so you don’t see this until you have finished a new game (I believe).
Another thing is that the same win now does not give you the same rank increase as it will on another time. One factor of Glicko is that rank deviation increases if you don’t play enough games, which affects how you’re rated. So if a 5k starts a rating period with a win from a 1d, he will gain less rank (and more deviation) if he played regularly than if he hadn’t played in the last half year.
Finally your rank is not really the number that’s listed on your profile, but rather a 95% certainty range between two ranks, as indicated by the X ± Y (so your rank is between X-2Y and X+2Y with 95% certainty). In a normal distribution the median, the mean and the highest point on this interval coincide, so changing your deviation does not affect your rank. However, in other distributions it can be perfectly consistent that your rank (the midpoint of the interval) shifts if your deviation shifts. (But I’m not sure what distribution Glicko has. If it’s normal, this is irrelevant)
The question is not if the system works as is intended: it does (probably). The question should be whether this system is too volatile or not (and if that’s necessarily a bad thing).
The point is that we can’t tell the difference between “seems broken math” and “is or is not actually broken math” because the site doesn’t provide us enough actual data about how our rating was calculated.
I think that’s what’s actually being questioned. There is no point in debating whether this system is too volatile when we can’t be sure it isn’t simply broken … especially while there there are indications that maybe it is, and not enough data to prove that it isn’t/
Unfortunately it’s kind of a cop-out answer, that is “don’t question correctness of ogs god”.
For example, in general chat sometimes the rank of players jumps between two values (with or without refreshing page, very weird). For a long time many people wrote it off as caching problem or as system catching up with recent rank change. However couple of days ago I saw jumping rank of a player that played last ranked game a week ago. Looks a lot more like a bug now and not “system works like that”.
Coding mistakes are possible and excusing it with “a lot factors” doesn’t help. The problem is that we don’t have a lot of ways to check rank changes. Even just a field that would tell you how much your rank changed after playing a game would help. Maybe write a script for that?
Agreed, I would be able to state things with certainty if I knew how the actual algorithm works. I can only base my things on the Glicko paper, which does allow for the points I provided, but doesn’t give me any insight in how the site works, of course.
However, there is not so much leeway in implementing a Glicko-2 rating system. The only values that are ought to be changed, are the size of the rating periods and the volatility constant. The rest is basically pseudo-code for the algorithm itself. Unless the “OGS god” has a degree in numerical methods and thought the algorithm could be improved, I doubt that the algorithm itself has been changed. (but this is once again a complete guess, based on the idea that developers aren’t evil goblins)
This is very difficult, if not impossible, with Glicko, as it is not know beforehand in which period you finish the game, what the ranks / deviations of the two players are at the end of the game and it depends on the other games played by the other opponents of both players in the same rating period.
I wrote and meant exactly “how much your rank changed” and not “how much your rank would change”. After the game is finished it’s as simple as comparing two numbers. But it would allow for people who play tons of games see all weird things Glicko does + maybe some kind of common irregularities.
OGS is not very transparent. Try finding out what counts as live and what as blitz. Not trivial at all.
I don’t think corr, live, and blitz are weighted differently, are they? When I talked about blitz I was referring to how it’s both easier to surmount a strength difference and quicker to finish a large amount of games, resulting in a significant compound effect.
That’s all I can do, too… guessing… but it isn’t very scientific, is it?
I really appreciate everybody’s effort to guess and justify, but my logic mind would prefer solid data.
As an example when are batches of games recalculated? The first of month? When it’s full moon? On a random basis? This could help to check changes.
@GreenAsJade, could you help? You are able to modify the site code, so you should be able to identify the function (i used to call them “subroutines” when I was young) that does this job. Is it possible? Could we look at the code? Perhaps the answers to my doubts are in plain sight and I just don’t know where to look…
Well, the thing is that only the “client” side is open source. So I can read and contribute to the way the site looks, and the interface (buttons, where they are etc) but not to the underlying “fundamental way it works” (aka server side code).
There could be an issue with the way, how OGS calculates the deviation of the players ratings. This results in high fluctuations of the players rank.
I found this, because I wanted to understand Glicko2 in deep and to get a better insight in how OGS implements it.
In the plot below, you can see my rating as shown by OGS in the rating history (upper black line)
and the ratings as calculated by me (upper green line).
The lines on the bottom of the graph are their deviations.
The red lines are a calculation with my implementation, but forcing the deviation to be the same as OGS ones.
As you can see, the green line is much more stable than the OGS rating, while the red line follows OGS, while the deviation of the OGS data is much higher (about 2 times, didn’t calculate the exact number).
I hope this helps to pin down the maybe issue with the unstable ranks. Sorry for the necro.