Unofficial OGS rank histogram 2021

This shouldn’t be the case. It does appear there could have been some kind of error, but part of the reason ratings changes take so long is because

What this actually looks like, is anoek having the server re-run every single ranked game through the new algorithm so that it’s as if the current rating system had been used for all time.

2 Likes

Oh am i mistaken? Is the old elo-based rating data still existing somewhere? I would love to see the old ranks from that time ^^

I think we are confused on what is meant by “data”

I think anoek keeps the old rating system alive for a short time, just in case the new one goes tits up and he needs to roll back…

what I meant by keeping the “data” is that every single ranked game gets re-analyzed, so while you can’t compare the new rank to the old rank through history, the new historical ranks should still operate appropriately, as it’s as if we have had it the whole time…

for some reason, the historical ranks in this case do seem to be off for some reason :man_shrugging:

2 Likes

I still don’t think I agree that recalculating every game makes any sense.

I don’t see why one couldn’t figure out a mapping and then just apply that mapping to the current ratings.

It can’t be much less chaotic than it already is. No-one really knows how to compare x-kyu before to y-kyu now anyway.

Not to mention that it makes it look like players who were new to go and the server started off at 10kyu and went up from there, when they were actually 20-25kyu for quite a while.

There’s probably nothing that can be done about it now, but I don’t think it was a good idea. (And it might need to be recalculated again if they want to fix blitz games)

The old rankings were wrong so every game made every ranking slightly more wrong. Mapping only works on the global scale. It might be true that most ranks went up by 3 (or whatever it was) but that doesn’t mean you can just put every rank up by three and call it a day.

By running it from the start, assuming your new code is correct, you remove the wrong data points rather than trying to mask over them.

Ahh i see what you mean, then yeah xD

I was talking about pre 2017 rating system update. What i meant ‘missing’ is exactly what you said, the data we have now is re-analysed and based to results long after the games took place.

Good example case of what i mean by data gone missing and the “problem*” it causes here, T:3183 R:1 (ennuiaboo vs KoBa), you can see from the chat both oh us being 14k, and from the level of our moves that indeed we were both was far into ddk-land based on our skills ^___^
Now when adapting to the current rating system our ranks are shown as 1d and 7k in that case, which clearly is just false and doesnt make any sense xD

If you take a closer look of my that user rating graph and game history you’ll see how weird it is.

By looking at the graph and game history, it now seems like they were always (played for 2 months in 2014 and few times in 15-16) high sdk / low-dan player - apparently even reaching 2d at some point "friendly" "game"
But according to the game chats, the user never reached higher than 10k. Then after coming back in 2018 they were visibly confused about their new 1d rank and lost some games before quitting ogs again kalhartt vs. ennuiaboo

When looking these old games, there is no data of what has been our ratings back then, how much difference there was between the players elo’s, or who was the higher rated player when that game took place. Ranks have been saved in chatlogs, but that is the only place where you can see how were the players ranked when the game took place.

*i think the only problem this causes is the inaccuracy of making any kinda statistics about players ranks from pre 2017 era.

4 Likes

Be careful about those. When I was looking through those, they were a bit off too. I think something is up with them too.


I think looking at historical change is still interesting.


I like the idea of recalculating the ranks from the start, cute idea. But throwing away old ranks was a mistake. So sad.

4 Likes

Some older ranks can be seen on GoKibitz, for an obviously very small number of players.

For instance, I can see my OGS rank from October 2018 on by referencing those SGFs.

1 Like

As I understand ranks in chat aren’t overall ranks either, it’s the category ranks which are famously lower than overall rank.

The old Elo remnants exist somewhat under codeword “egf”. For Koba’s game T:3183 R:1 (ennuiaboo vs KoBa) we get 742 → 742 / 100 = 7 → 7 + 9 = 16 → 30 - 16 = 14k!

Or the game that was played right after this screenshot A short travel through OGS history - #28 by DVbS78rkR7NVe was taken: correspondence egf=1685.58 (matches the screenshot) → 1685 / 100 = 16 → 16 + 9 = 25 → 30 - 25 = 5k!

Again these are category ranks.

3 Likes

The noobs are swelling because we are sampling more noobs from the nearly infinite universe of potential noobs (see my post on normal versus power laws above). More people are playing Go. It’s a good thing.

1 Like

So they might be more or less accurate for those of us playing in more or less one category (eg mostly correspondence 19x19 or most live 9x9 etc).

I know they won’t be identical but one can imagine them being close ish with a small number of games in other categories just throwing them off a bit.

Ok, so “working with OGS statistics” has turned into “learning Python”, which, although a worthy endeavor, has not let me demonstrate any concrete results (yet). Trolling through OGS code and cogitating on rank/rating issues has led me to some conclusions, and some ideas. I would appreciate some feedback.

  1. There is NOT an infinitely bad Go player. Alpha Go learned from scratch, making random moves. Excluding the pathological case of players trying to lose (which may be wrong, as players may try to manipulate their rating. Hmmm, both players are trying to lose…?)
    1a) OGS currently maps rating->rank such that there is a minimum possible rank
    1b) Rating points at very low ranks don’t mean much, they are “play money”
    1c) Two tailed distributions are exploded - ELO and Glicko are clearly wrong. Note that USCF has noted problems at both ends of the rating scale over the years. Glicko’s range feature is desirable, and should be addressed.
  2. Given there is not an infinitely bad Go player, there is a limit, there is a worst Go player. This limit lets us introduce the possibility of a very simple power law distribution. This theoretical distribution has been proven to be common in “complex” situations, situations involving many actors employing many strategies. Go, anyone?
    2a) A contest between two players can be represented as two samplings from any distribution. Bayes Theorem, and the observation that Black wins->White loses and vice versa, gives us a flexible (unassailable?) probability model.
  3. It is natural to map the minimum OGS rank to the worst possible Go player. Given that fix, one can map expected results to actual results, and estimate the real probability distribution (based on rank, not rating).
  4. It is convenient to have “play money” at the lower ranks (see 1b). An eager noob should not run out of points to bet. The current mapping between rank and rating seems to make this nearly certain.
  5. So rankings are significant (with a theoretical minimum), and ratings are “play money” (at least at the lower end). I propose rating changes be based on “fair” bets, based on rank (not rank differences, the relationships are not linear). What is the most a higher rated player be willing to risk to an unranked/unknown player? 100? Maybe too small. 1000? Surely too big.
  6. New player… Theoretical minimum rank, arbitrary new rating… dunno.
    I suspect that a proper modelling of a ranking based power law will fix perceived problems at high skill levels. I also suspect that adopting the Bayes model and recognizing that there are many bad Go players we’ve never met will fix many perceived problems at lower skill levels.

WRT 5. Could players choose the size of their bets, with insight from their ratings?

One could suggest how many points one is willing to lose… Odds, based on ranking, creates fair ratio. Opponents’ risk choices determine actual rating changes.

I wouldn’t mind fixing ranks to a few stable bots.

Just had a thought that using percentiles, I can set myself a more achievable goal than ‘1 dan’ . . . I’m currently around 6K (65 percentile) and I’d be more than happy to reach 70 then 80 percentile. So my new goal is now 2.9K!

2 Likes

Samples of the mean rank would be normally distributed, but the underlying distribution generally isn’t even if there are a large number of samples.

1 Like

Update… my critique of two tailed distributions is correct, but irrelevant, because Go is not well modelled as a random process. My naive application of Bayes predicted a 10k beating a 9d a third of the time. I continue to tinker.

1 Like

I do not really understand what you are trying to do, but wouldn’t it be better to start from something more realistic and more easily modeled so that you can get some initial parameters that are more reliable?

I mean who knows what the real odds of a 10k beating a 9d are ( some variation of 0.0…0x% ) and how they differ from the odd of a 10k beating an 8d (also some variation of 0.0…0x% ).
Try, for example a 10k beating a 9k one time out of three or 2 times out of 5 and start extrapolating from things that are closer to reality?

1 Like

There is good data on 1-2k differences. I am basing my work on that. “Trying to accomplish”… Well just killing time being geeky. Surely a Go player can appreciate that :wink:

2 Likes