I barely ever get to play and my.deviation is still 64… Seems like a reasonable theory
I made a graph with stable players, and multiple histograms through time, normalized. As you can see, as time goes on, peak of players goes down but region with players at 21k gets fatter. Probably more new players joined in 2020.
So I continued my graph building for earlier months. It looks pretty good, a little too good, if you ask me. I wonder if there’s some kind of mistake somewhere. Look how perfectly noob region is swelling up over the years. Could it actually happen naturally?
Additionally I’ll make top post wiki so editability doesn’t expire.
A suggestion: could you make this into an animated GIF, so we can see the hump sloshing over time?
there was a discussion in other topic, your diagram may be so perfect because there is something strange with OGS itself
Ah yes, ratings were re-calculated retroactively on january, i guess it makes historical comparison pointless
Yup totally. in 2017 ogs switched from elo-based rating system to glicko2, this caused the loss of all previous rating data :<
Ranks from that era are still saved on game chats, if you have the time you can look thru old chats and plot the ranks from those ^^
This shouldn’t be the case. It does appear there could have been some kind of error, but part of the reason ratings changes take so long is because
What this actually looks like, is anoek having the server re-run every single ranked game through the new algorithm so that it’s as if the current rating system had been used for all time.
Oh am i mistaken? Is the old elo-based rating data still existing somewhere? I would love to see the old ranks from that time ^^
I think we are confused on what is meant by “data”
I think anoek keeps the old rating system alive for a short time, just in case the new one goes tits up and he needs to roll back…
what I meant by keeping the “data” is that every single ranked game gets re-analyzed, so while you can’t compare the new rank to the old rank through history, the new historical ranks should still operate appropriately, as it’s as if we have had it the whole time…
for some reason, the historical ranks in this case do seem to be off for some reason
I still don’t think I agree that recalculating every game makes any sense.
I don’t see why one couldn’t figure out a mapping and then just apply that mapping to the current ratings.
It can’t be much less chaotic than it already is. No-one really knows how to compare x-kyu before to y-kyu now anyway.
Not to mention that it makes it look like players who were new to go and the server started off at 10kyu and went up from there, when they were actually 20-25kyu for quite a while.
There’s probably nothing that can be done about it now, but I don’t think it was a good idea. (And it might need to be recalculated again if they want to fix blitz games)
The old rankings were wrong so every game made every ranking slightly more wrong. Mapping only works on the global scale. It might be true that most ranks went up by 3 (or whatever it was) but that doesn’t mean you can just put every rank up by three and call it a day.
By running it from the start, assuming your new code is correct, you remove the wrong data points rather than trying to mask over them.
Ahh i see what you mean, then yeah xD
I was talking about pre 2017 rating system update. What i meant ‘missing’ is exactly what you said, the data we have now is re-analysed and based to results long after the games took place.
Good example case of what i mean by data gone missing and the “problem*” it causes here, T:3183 R:1 (ennuiaboo vs KoBa), you can see from the chat both oh us being 14k, and from the level of our moves that indeed we were both was far into ddk-land based on our skills ^___^
Now when adapting to the current rating system our ranks are shown as 1d and 7k in that case, which clearly is just false and doesnt make any sense xD
If you take a closer look of my that user rating graph and game history you’ll see how weird it is.
By looking at the graph and game history, it now seems like they were always (played for 2 months in 2014 and few times in 15-16) high sdk / low-dan player - apparently even reaching 2d at some point "friendly" "game"
But according to the game chats, the user never reached higher than 10k. Then after coming back in 2018 they were visibly confused about their new 1d rank and lost some games before quitting ogs again kalhartt vs. ennuiaboo
When looking these old games, there is no data of what has been our ratings back then, how much difference there was between the players elo’s, or who was the higher rated player when that game took place. Ranks have been saved in chatlogs, but that is the only place where you can see how were the players ranked when the game took place.
*i think the only problem this causes is the inaccuracy of making any kinda statistics about players ranks from pre 2017 era.
Be careful about those. When I was looking through those, they were a bit off too. I think something is up with them too.
I think looking at historical change is still interesting.
I like the idea of recalculating the ranks from the start, cute idea. But throwing away old ranks was a mistake. So sad.
Some older ranks can be seen on GoKibitz, for an obviously very small number of players.
For instance, I can see my OGS rank from October 2018 on by referencing those SGFs.
As I understand ranks in chat aren’t overall ranks either, it’s the category ranks which are famously lower than overall rank.
The old Elo remnants exist somewhat under codeword “egf”. For Koba’s game T:3183 R:1 (ennuiaboo vs KoBa) we get 742 → 742 / 100 = 7 → 7 + 9 = 16 → 30 - 16 = 14k!
Or the game that was played right after this screenshot A short travel through OGS history - #28 by S_Alexander was taken: correspondence egf=1685.58 (matches the screenshot) → 1685 / 100 = 16 → 16 + 9 = 25 → 30 - 25 = 5k!
Again these are category ranks.
The noobs are swelling because we are sampling more noobs from the nearly infinite universe of potential noobs (see my post on normal versus power laws above). More people are playing Go. It’s a good thing.
So they might be more or less accurate for those of us playing in more or less one category (eg mostly correspondence 19x19 or most live 9x9 etc).
I know they won’t be identical but one can imagine them being close ish with a small number of games in other categories just throwing them off a bit.
Ok, so “working with OGS statistics” has turned into “learning Python”, which, although a worthy endeavor, has not let me demonstrate any concrete results (yet). Trolling through OGS code and cogitating on rank/rating issues has led me to some conclusions, and some ideas. I would appreciate some feedback.
- There is NOT an infinitely bad Go player. Alpha Go learned from scratch, making random moves. Excluding the pathological case of players trying to lose (which may be wrong, as players may try to manipulate their rating. Hmmm, both players are trying to lose…?)
1a) OGS currently maps rating->rank such that there is a minimum possible rank
1b) Rating points at very low ranks don’t mean much, they are “play money”
1c) Two tailed distributions are exploded - ELO and Glicko are clearly wrong. Note that USCF has noted problems at both ends of the rating scale over the years. Glicko’s range feature is desirable, and should be addressed.
- Given there is not an infinitely bad Go player, there is a limit, there is a worst Go player. This limit lets us introduce the possibility of a very simple power law distribution. This theoretical distribution has been proven to be common in “complex” situations, situations involving many actors employing many strategies. Go, anyone?
2a) A contest between two players can be represented as two samplings from any distribution. Bayes Theorem, and the observation that Black wins->White loses and vice versa, gives us a flexible (unassailable?) probability model.
- It is natural to map the minimum OGS rank to the worst possible Go player. Given that fix, one can map expected results to actual results, and estimate the real probability distribution (based on rank, not rating).
- It is convenient to have “play money” at the lower ranks (see 1b). An eager noob should not run out of points to bet. The current mapping between rank and rating seems to make this nearly certain.
- So rankings are significant (with a theoretical minimum), and ratings are “play money” (at least at the lower end). I propose rating changes be based on “fair” bets, based on rank (not rank differences, the relationships are not linear). What is the most a higher rated player be willing to risk to an unranked/unknown player? 100? Maybe too small. 1000? Surely too big.
- New player… Theoretical minimum rank, arbitrary new rating… dunno.
I suspect that a proper modelling of a ranking based power law will fix perceived problems at high skill levels. I also suspect that adopting the Bayes model and recognizing that there are many bad Go players we’ve never met will fix many perceived problems at lower skill levels.
WRT 5. Could players choose the size of their bets, with insight from their ratings?
One could suggest how many points one is willing to lose… Odds, based on ranking, creates fair ratio. Opponents’ risk choices determine actual rating changes.