2020 Rating and rank tweaks and analysis

Hello again. Handicap is behaving unusual, isn’t it?

I got two stones again spicyspyigo even though I swear it said “Auto” handicap.

I tried to reproduce it on beta. It’s not consistent but look:

9k vs 1k with only 4 stones (sorry for confusing names). Deviations are less than 160 so no humble ranking should be involved.

Am I right or am I insane?

4 Likes

Hi Anouk,

Are glicko2 ratings equivalent to Elo (100 points difference means about 65% winning probability in an even game)?

If that is the case, I’d like to bring your attention to this page Elo Win Probability Calculator at the paragraph Elo per stone in Go.

I hear you there, but I don’t think handicap should be the end of that discussion. After all, as far as I am aware it’s also being used for matchmaking and is the only rating that gets converted to a rank (as previously it was said that the conversion of those is “misleading”)

So yes these things are tracked, but the player basically only sees this combined rating/rank that isn’t always a great descriptor at higher ranks, where each board starts requiring its own board-specific knowledge and skill.

Nevermind how it’s the rank you see everywhere if you keep ranks on, which might be the real complaint these people have

Transforming the OGS conversion formula to an Elo per rank graph, I get something like the red curve in this diagram, while the actual game results from EGF database would correspond to the purple curve.

So the OGS conversion uses a much lower Elo per rank in the strong/kyu dan region than observed in real life tournament games. The consequence of that may be a very strong inflation of those higher ranks. This might explain how a 4d EGF can reach 9d on OGS: Uberdude

I’m curious where they got the AGA data, as AGA doesn’t operate on a scale that (judging from their paper on it here: https://www.usgo.org/sites/default/files/pdf/AGARatings-Math.pdf) doesn’t seem to operate in a manner remotely similar to ELO, and I’m not sure what a “proper” conversion would look like…

I’d really like if that site showed how many data points they had, and where they took it from, because the model Anoek designed is based on the data found in OGS’s actual play for handicaps

Handicap is reduced when at least 1 player has a deviation of 125 or higher. The reduction depends on the deviation.

1 Like

I don’t know where that site got the data from.
I don’t know much about the AGA system, but I analysed the EGF database (about 1 million games from 1996 til now) and my findings are exactly like the graphs the site gives for the the EGF (including the discrepancy between the expected Elo per rank and the actual Elo per rank).

I have to admit that those EGF games are mostly even games played in tournaments (only about 10% of the games in the EGF database are handicap games). But I assume most declared EGF ranks come from handicap games played in European go clubs. I also analyzed the EGF handicap games in 2017 and the handicap results were about right (close to the expected winrate, given the handicap and the rank difference).

We used to use the elo system that egf uses, but basically it moves too slow for online use, hence the switch to glicko2. The elo per go stone computations they do are pretty decent, but I think we have a slightly better fit based on the comparison of black win rates at different handicaps

2 Likes

I think you’re right for the kyu range. The current EGF expected Elo per rank is much too high in the kyu range.

But for (high) dans, I think the EGF expected Elo per rank is better. But it can be improved (I’m involved in an EGF commission to evaluate the EGF system)

2 Likes

There aren’t that many high dans so I do think that you get into situations where they get inflated (basically if they are good enough to not really lose against the 1-4d player pool they usually have access to, then they keep going up and up in rank). What’s needed to really put either system to the test is a lot of players their strength and for them to play more handicap games against lower dans.

FTR here’s our dan data:

   1d      49.8 : 49.9 [n=66725]   41.6 : 49.0 [n=3681]   46.8 : 49.2 [n=2023]   44.6 : 49.1 [n=1069]    46.7 : 49.4 [n=735]    56.1 : 50.7 [n=342]     51.5 : 50.9 [n=97]      64.0 : 52.1 [n=25]     63.6 : 50.6 [n=22]      --
   6d      49.6 : 50.7 [n=2732]    53.2 : 50.0 [n=220]    

So it looks like it still does pretty good up to the mid dans, but yeah i can’t claim that every 9d on the system is truly 3 stones stronger than the 6d’s. There’s not enough of them to really settle that out at this time.

The case of Uberdude shouldn’t be taken as a case study I don’t think, he was a very strong correspondence player and I suspect he spent a good deal of time really analyzing his games, something that you don’t have the luxury of doing in person. Also, there’s no standard definition of how strong a 1d is… we’ve tried in the past to align with the EGF a bit, but I don’t know how close we are these days, it’s been several years and a rating system since we did that :slight_smile: It’s on my mind to do another big poll and gather up data so we can come up with an inter-raking system mapping.

5 Likes

I’m fairly confident that EGF 8d can give EGF 4d a handicap of 4 stones (from my personal experience 13 years ago, when I was rated 4d+ EGF and was only able to win 1 in 5 games on 4 stones against Cho SeokBin, rated 8d EGF at the time).

1 Like

I’m really sorry to butt in, but one single personal experience from 10 years ago, I’m not sure is the strongest argument to upend a system.
I feel we are derailing to personal preferences at this point, I see similar posts starting to sprout.

5 Likes

From the EGF’s database on Winning Statistics | E.G.D. - European Go Database when a 4 stone handicap is given between 4 stone difference Dans, the winning percent for black is ony 39%, so it’s off a bit. It’s not that bad mind you, EGF is doing pretty good especially for what data they have to work with, but there’s room for improvement I feel. What this tells me is that the EGF ranking system is too compacted in the dan ranks, I think y’all need to spread them out a bit.

For comparison, ours dan 4 hc game win rate is 46.7% with 49.4% expected, so I don’t think it’s accurate to say the EGFs system is better in the dan ranks at this point.

That being said, are our dans the same as your dans? Who knows! They’re probably kinda close, but we’re probably going to rank a 5d EGF higher than 5d here, because we spread them out a little more, which I think is the right thing to do, according to the data that I have crunched and seen. The question I have in my mind is how do our 1d’s compare to your 1d’s? If those don’t align or aren’t at least pretty close, that’s going to skew things a little if dans are a problem rank band.

9 Likes

I don’t really mind. I have no horses in this race. But I think OGS might attract more stronger players if the rating system is more realistic for them.

2 Likes

Fair enough.You have the data. Just want to take this opportunity to say I love the site and the game. Not played go before and started by playing here about 7 weeks ago. I like the fact that you’re doing the analysis to make the ratings work as well as possible. I particularly appreciate the the fact that the site is not ad supported, giving you no incentive to suck data [other than game data] on all your users and whore it off to the highest bidder.
-Oikake

6 Likes

I don’t really know. I have only anecdotal data about OGS. I only analysed EGF data in detail.

1 Like

Yeah it was rhetorical, before too long I’m going to spam everyone with a poll to help figure that out. I was basically just pointing out that it’s a bit hard to actually compare the performance of our systems without having this understanding.

7 Likes

Do we have a collection of reasons why specific stronger players have expressed a lack of desire to play on OGS?

Speaking entirely for myself, I would have little problem playing on a site where my rank settled down as a 12 kyu player, as long as the player base still had plenty of similar strength players to play against. Is this a case of strong players being put off by the actual rank system, or just not having confidence in how strong the players are on the site.

If I was going to sell OGS to stronger players who specifically wanted strong opposition, being able to point out specific players and their rank somewhere the invited person is already familiar with should surely be sufficient, rather than relaying specific OGS ranks to them (and bear in mind that Tygem etc will permanently offer a plethora of very strong opponents and makes for extreme competition purely on playing strength terms, so I can actually understand an EGF 7 dan’s reluctance to join).

Regarding your anecdote, I was actually playing a bit more actively in real life when Cho Seokbin, Oh Chimin and a couple of other super strong ex-Korean inseis joined the European scene, and I don’t think their EGF ratings were particularly reliable. Their results were mostly “win against everyone but each other”, with a couple of people like Fan Hui scoring against them. If they could reliably and consistently give four stones to an EGF 4 dan, I would posit that their “real” rank should have been higher than it was when settled, but it’s very subjective at this point.

Returning to my original question, if we want to find out the most accurate reasons why specific players may choose not to play on OGS, I think that asking specific questions and relaying specific answers would be the place to start.

7 Likes

From my experience attempting to advocate for OGS, almost universally the answer is “OGS is great for kyus but it’s too hard for dans to get a game”

which essentially boils down to “dans don’t play here because dans don’t play here”

16 Likes

Thanks. Is it true though? I know the title tournaments have a collection of very strong players in, as I got the chance to play some of them. My experience and thoughts are rather strongly skewed towards correspondence play. Certainly if I was mid to upper dan and playing regular real time games, I’d probably play on Tygem for lots of top level competition.

Other than the argument of “the only way to make it easy for dans to get a game is for more dans to join”, I’m not sure what the answer is here :stuck_out_tongue:

4 Likes