2020 Rating and rank tweaks and analysis

Correspondence is irrelevant, we basically have no competition… Maybe only Dragon?

When discussing “which server do you play on” it is usually implied “live 19x19 games” as this is the arena in which we are competing with IGS, fox, tygem, etc.

4 Likes

We’re also constantly asked to do reviews and teach and stuff. It’s not that I’m against it, but getting multiple PMs a day and challenges and stuff when a review alone takes 1-2 hours to do is exhausting.

This is kind of why I wanted that “teacher role” a while back, so that the dans that want to play and chat, but not teach, don’t get bothered.

EDIT: It might also help get people that want to use OGS as a teaching/streaming platform the income they want, further incentivising twitch streamers and the like to use OGS…

14 Likes

This makes sense, thanks. I guess we just can’t avoid discussing the statistical merit.

Would it be possible to analyze if using the 19x19 rating (blitz, live and corres combined) without the “single-size-only population” issue of making the sliding window adjustment inaccurate (with a different, more suited calibration), versus using the new combined rating (19x19, 13x13, 9x9, blitz, live and corres all combined) with the issue, and see which one is better to calculate the 19x19 handicap? (let’s set 13x13 and 9x9 handicap aside for the purpose of this discussion now)

My point here is the size of ogs player population that plays one size only may well be much larger than the population that mostly plays non-19x19 sizes during the recent period in question, so that the ill-effect may be stronger either way.

2 Likes

I think the EGF stats are correct, and the OGS ones are wrong. White is expected to win more in traditional handicap games because they are on average 0.5 stones less than what would create a 50/50 game. EGF, AGA, and KGS rating systems all do it this way, here is the reference for EGF about that, note the “H-0.5” part:

https://www.europeangodatabase.eu/EGD/EGF_rating_system.php#System
The system also allows to include handicap games assuming that the rating difference D is reduced by 100(H-0.5), where H is the number of given handicaps.

2 Likes

It’s not inaccurate at all, it’s quite good in fact, it’s very close to the overall, within the margin of error pretty easily I would say. That data is posted up above in the original post if you want to look at it. However, the Overall also covers that case very well, and it’s slightly better than the 9x9 and 13x13, so the numbers seem to indicate that there’s no disadvantage to using the overall.

4 Likes

A bold, but perhaps uncorroborated, statement :wink:

This is true in both systems, however ideally that win rate for black in handicap games should be consistent for all handicaps greater than 0. If not, then your rank spacing is not perfect, by the definition of what a rank means in Go.

9 Likes

I wish ratings/ranks would be separate between tournaments and friendly games. I recently get demoted from 6d to 2d because I lost most of my games on a tournaments against dd Kyu players ( which I totally admitted that it was my fault because I go on vacation without giving notice to the admins of the tournament and I totally forgot about that tournament rules since I was busy working).

I also wish that in games history there will be ways to separate your games from tournaments, ladders and friendly games like making a tab for each category.

1 Like

This is just my opinion, but someone who treats their rank seriously, they should also treat ALL their games seriously.
Or else, if they allow some games to be less “serious”, they should also allow their rank to reflect that.

The choice between ranked and unranked games seems enough for me.

I really hope we get any kind of categorizing our games soon, because now it’s impossible to navigate.

12 Likes

Thank you very much for the change, now my bro is 1 rank lower than me again! :heart:

13 Likes

Are these statistics for handicap games? What do the percentages mean?

2 Likes

This is more of a question, I don’t have a suggestion (call the press), but what if dans rose from within the OGS? What if OGS made a name of a server where opponents are good and competitive and where you get a true dan rank by playing? Or that players are improving solidly, so someone you beat might beat you next time?

If I have understood correctly, it’s not that OGS doesn’t have strong players, it’s that it doesn’t have dan players (the tag, not the value).

Sometimes a coveted group is not one of existing A-listers, but of aspiring A-listers.

This is just a random thought, probably OT, I will not argue anyone who corrects me. :slight_smile:

3 Likes

I explained the format up above in a previous post, but its’ handicap 0, 1, 2, etc… with the format Actual black win rate % : Expected black win rate % [n = number of games]

3 Likes

“I think the EGF stats are correct, and the OGS ones are wrong. White is expected to win more in traditional handicap games because they are on average 0.5 stones less than what would create a 50/50 game.”

I think you might not be quite understanding the statistics. Either that or I’m not correct on how they are collected anyway. I am under the assumption that the collected data is not on games of H-0.5, they are games where the handicap matches the rank difference (and therefore at H-0), and therefore on the basis that the handicap stones are equivalent to a single rank at all ranks, one would expect a 50/50 win rate (an assumption that probably only holds true where the EGR/Glicko rating is not a linear X points per rank across all playing strengths, as there seems to be a general belief across stronger players that 3 stones at dan ranks is more significant than 3 stones at DDK ranks - FWIW, ranks in the EGF are basically 100 EGR points = 1 rank, which means I suspect the handicap calculations are likely to be off at least at one end of the scale, and possibly both ends)

Maybe this does create a new and interesting question. If the argument is that high strength EGF players only play handicap games at a slight advantage for White (handicap is always half a stone less than “true handicap”), why is this? Is this a philosophical belief that having the stronger player only have a 50% chance of winning is some way unfair? Does the average player playing on OGS or elsewhere expect that when a handicap game is automatically created that the game should favour the White player slightly? It’s the first I’ve come across this, but maybe this is an institutional bias we have?

My limited discussions and observations of interviews by players from an Eastern Go background is that handicap stones are primarily used in teaching situations and not in competitive games anyway, on the basis that they make the games more instructive. Some exceptions to this are things like the jubango games, or Korean games-for-money where I’ve heard of handicap stones being treated sort of like a Contact Bridge bid, but those are somewhat special circumstances in both cases.

Obviously whether OGS decides that handicap-0.5 is a more appropriate setting is separate to the issue of “correctly calculating appropriate handicaps by rating difference”, but it does make it increasingly more complex to calculate expected win rates.

Interestingly, and I don’t know of anyone who has done similar work with Go, but a statistician named Jeff Sonas did a lot of very compelling work over time demonstrating that, in Chess, the default expected win rates according to ELO calculations were actually very suspect, In particular this was true at the higher ratings where he had the most data and did most of the work, and found that expected win rates were much closer to linear than to the ELO distribution. www.chessmetrics.com has quite a lot of his work still, but I can’t find a few of the detailed distribution articles he published on there.

3 Likes

My teachers are all dans in real life also some in my local go club.

1 Like

Some (but not all) egf tournaments that use handicaps (not all of them even this) have MMS-1 (McMahon Score(difference) -1) to determine hc, meaning that you often see 5k vs 7k having just the reduced komi of 0.5 without any extra stones. Or you see 5k with 0 wins playing even game with 8k with 2 wins on round 3 of a tournament.

To make it even more confusing, some tournaments have a limit where handicaps are applied. Like example dans playing even games while kyus using handicaps based on mcmahon points.

Also sometimes tournament doesnt have enough ddk’s for providing the very lowest ranked players enough opponents that are within 9 stones from them. So you have 9hc games between players that are lot further apart from each others (and this happens on ogs too - tournaments with random/slide/slaughter pairing create ranked games between players over 9 stones apart)

The point is, be very careful when analysing results from handicapped tournaments.

6 Likes

I just got upranked after a loss. Personally, I don’t mind losing rank after a win (it’s just glicko working as supposed) but it would be more consistent if it would go the other way also.

2 Likes

You can correct for half a stone and the EGF does this.

For example, say you’re a 7d EGF playing against a 6d EGF on even. The difference between those players is 100 GoR, which is about 200 Elo in that range, which means the 7d is expected to win 75% of the games.

When the 6d gets black without komi, it only compensates for half a rank. So the difference is still 50 GoR, which is about 100 Elo in that range, which means the 7d is still expected to win 65% of the games.

Another example: a 19k plays against a 20k. On even, the difference between these players is 100 GoR, which is about 35 Elo at than range, which means the 19k is expected to win 55% of the games.

When the 20k gets black without komi, it compensates for only half a rank. So the difference is still 50 GoR, which is about 17.5 Elo in that range, which means the 19k is still expected to win 52.5% of the games.

Same again but now for 2d vs 1d (100 GoR is about 100 Elo in that range):
even game (100 Elo) => 65%
no komi (50 Elo) => 57%

1 Like

“You can correct for half a stone and the EGF does this.”

There are many factors that give cause to question the accuracy of any correction. Firstly, it depends on the formula, hence the second half of my previous post. This is then complicated by the fact that correction per hundred points in rating difference is static in ELO (it’s more complicated with Glicko2), and that rank difference is the same as rating difference across the rating distribution. 100 points equates to 1 rank more or less across the spectrum on the EGD (at least 20k to 7d), whereas on OGS it doesn’t equate to 1 rank across the rating distribution. It also goes on the basis that half a handicap stone is the equivalent of 50 GoR regardless of the strength of players - even based on the premise that the lack of linearity for ranks on OGS is designed to make “1 rank = 1 stone”, there is now still a very legitimate question on whether “if 1 stone = 100 GoR at EGF 2300 and 1 stone = 60 rating points at OGS 1500 accurately handicaps a 1 rank difference, does 4 stones = 400 GoR at EGF 2300 and 4 stones = 240 rating points at OGD 1500 still hold true for players with 4 ranks difference?”

Essentially of course you’re correct, particularly if you’re fairly strictly following ELO as a gold standard, it’s actually very easy to model corrections, and this is exactly why I used ELO for OGS at the beginning. It’s a rather crude instrument as far as predictive algorithms go though, and if we’re discussing the “correct” value of rating points per handicap stone at different ranks, and the accuracy of the predictiveness of both differing strength players in an even game and the impact of handicap stones on the outcome at different points in the rating spectrum, accurate correction becomes anything but straightforward.

Just to make things more complicated, there’s also now a very subjective side discussion on whether a “correctly handicapped” game should aim for 50/50, or whether it should be favouring the white side. There’s a lot of questions here that can only be answered by “what OGS want to do with their system” rather than have an underlying objective truth.

Just to sound like I’m not just being a troublemaker and poking holes in everything, I would like to say that the effort that the OGS team are going to to address this is considerably more than I have ever seen anywhere else. I suspect that whatever solution that is chosen will be more than good enough regardless. I was happy for many years playing on servers that just had the far more simpler systems of “win = X points” (IGS, now Pandanet, no idea if they’ve retained the system) or “X wins over Y games = promotion” (one of the Korean servers, can’t remember which) and didn’t seem to suffer as a result of it.

6 Likes

Well, the difference between ranks is 100 GoR (by definition of the EGF rating system). But 100 GoR is not the same as 100 Elo. See this diagram:


I was using the blue line for my previous calculation. The green line is what the EGF uses, but it’s not really aligned very well with the actual data (blue line).

For the EGF the basic assumption is that handicap defines rank differences (after correcting for that half stone advantage for white). The EGF rating anchor is 7d where pro level play is assumed to start.

I suppose that a simpler system will work fine for a while, but the EGF system has been running for decades without adapting the system and the overall inflation/deflation seems to be fairly small. I can imagine that a go server faces extra challenges for maintaining ratings, but I think the EGF system has proven its quality for offline tournament games (which doesn’t mean it’s prefect).

2 Likes

The main reason I think OGS should make it so that handicap games favor white by 0.5 stones on average is so their ranks have a better chance to be similar/comparable to others. EGF, AGA, KGS, and IGS all use this assumption. So if OGS does not use that same assumption, the definition of ranks is different, and will probably lead to differences.

There is a good theory for why it’s done this way. The difference between H1 and H2 is 1 full stone. Same for H2 to H3 etc. It’s as if black plays a move, white passes, then black plays again. But the difference between H0 and H1 is not 1 full stone, it’s 0.5 stones. In a H0 game, Black pays 6.5 komi. To make a full stone difference, it would be like Black passes, and White moves first instead. In that case though, White is moving first, but still receives 6.5 komi. According to this logic, a true 1 full stone handicap game would be where Black moves first and receives 6.5 komi (-6.5 komi). Since the traditional system makes a H1 game 0.5 komi, that is only half the points compensation of a theoretical 1 stone difference.

It’s true in real life we don’t know if these theories match reality. But IMO OGS should not create their own new theory, because that will make their system different from most other systems out there now.

3 Likes