Recognising and fully implementing ranks beyond 25 kyu

Haze_with_a_Z · May 17, 2020, 6:01pm

2 ranks can be a large gap.

Sports_for_Life · May 17, 2020, 6:03pm

not really

李建澔2 · May 24, 2020, 9:04am

Good idea but some 25ks won’t know so it is better to automatically invite them so they can see it and have a higher chance for them to join also I suggest that to be in simultaneous McMahon so more 25ks could join

Haze_with_a_Z · May 24, 2020, 2:13pm

It could also be an elimination tournament. The same number of people can join those as a simultaneous mcmahon, and then you could easily have the tournament be live or correspondence or both. It can be slightly hard to play multiple games at once, especially that many if you are not used to it, if it is decided to be a live tournament. I think that double elim would be the best because it is a style used for some live ASTs, and(in my opinion) it has worked well for us. Swiss is weird because I feel like you never know when it will end, and single elimination works sometimes, but it is a bit frustrating to lose once and be eliminated, and if it was just a fluke or something, a win that doesn’t usually happen, you aren’t left with false data that says some of them just aren’t good, and the rank difference is unneeded, because they played one bad game.

I will point out that some 25ks might normally play 19x19, while others play 9x9 and they might not be as good at the other sizes. You could solve this problem by having a few tournaments for them with different sized boards.

Sports_for_Life · May 24, 2020, 8:24pm

There should be all ranks

Handibot · May 26, 2020, 8:41am

I do think that human ranks and bot ranks should be separated with each other. For example, the latest version of Leela v- 3200 would be about 15-16D without deliberating it with the flying knife joseki, and Katago 40b 60k visit is about 19D. Even the top pros (who would arguably reach an elo of 11-12 D) would probably lose to a leela v 1600. Thus, it seems impossible to connect human players with bots.

shinuito · May 26, 2020, 8:01pm

This doesn’t seem to be the right thread for that discussion since this is supposed to be about ranks lower than 25 kyu. (However in principle it could be a separate rank like the pro ranks, but that could depend on whether you want bots to play ranked games or not. You also have to think about that there are bots playing at all rank ranges not just 9d+ bots.)

Also the discussion more or less stalled out in this thread.

Gia · June 9, 2020, 1:37pm

Did we implement ranks below 25 kyu? I see an opponent momentarily as 28k (their card at top right in our game) and then it shows 25k again.
Should I party or it’s a feature?

Haze_with_a_Z · June 9, 2020, 3:27pm

I couldn’t confirm, although I was thinking that bit could be a bug. I have had it show me that before, some of the times were before this thread. I would guess it is a bug, although they certainly could’ve implemented too.

Sports_for_Life · June 9, 2020, 7:07pm

I agree with @Gia and @Haze_with_a_Z I see that sometimes.
@anoek Is it a bug? Just curious.

shinuito · June 9, 2020, 9:02pm

I’m gonna go with it probably wasn’t implemented. Looking at the games page https://online-go.com/observe-games and going to the very last or near last page, of live or correspondence, seems to show 25kyu players. I think some of their ratings graphs make it look like (some of) their rating(s) are a good bit below the 25kyu bar.

shinuito · June 24, 2020, 6:47pm

Crossposting this here

since it has an update to do with ranks beyond 25kyu in the form of data and suggests discussion about maybe adding ranks from 25kyu-30kyu but not setting an automatic handicap in this range.

topazg · June 25, 2020, 12:57am

My comments here may be redundant after the latest rank tweak post, but unfortunately I read a thread about ranks, ratings, and a small red button gets pressed somewhere in my brain. I’m not sure if I have the willpower not to dive in. So starting with an apology for the upcoming wall of text, here goes …

(and with a caveat that I stopped caring about my rank a very long time ago. I have no idea what I am, and I’m not bothered by it - I’ve even turned off any form of rank display on this site, which I think is a particularly awesome OGS feature)

TL;DR – If people want to use their rank to monitor their progress, have no limit. A 9d getting stronger should feel like hitting 10d is a logical demonstration of their progression. There is no reasonable reason to hard limit an upper or lower bound to rank. The complications arise when trying to work out sensible handicaps, but OGS already has a non-linear relationship between rating differences and handicaps anyway from what I’ve seen, so just extend the same system you already have - if it’s slightly off, that’s fine, it can be tweaked over time. As it appears handicap is actually worked out by different rating point differences at different levels, it isn’t impacted by rank anyway - rank is just a visual representation of the handicap equivalence. – END

Below is the myriad of (far too) long musings that lead me to my summary above. Don’t bother reading it unless you have a nice drink to hand and a bundle of spare time. It will be incredibly boring to a lot of people I suspect.

All handicap stones are equal, but some are more equal than others

Firstly, the question is hard to answer without recognising what the purpose is to the system, and if you asked a large sample of OGS members, you’d probably get a very wide range of different answers. If a standard-ish Glicko formula for updating ratings based on results is used, and you make the argument that “ranks only serve to balance games between players of different skills by applying appropriate levels of handicap”, then not every 100 rating points represents the same optimal handicap. I do broadly agree that the stronger you are, the more significant a single stone of handicap is. This is relatively logical, the stronger the player, the more understanding that player has to maximise the value of their stone. So a 100 rating point difference at 2000 is not going to be the equivalent to a 100 rating point difference at 700.

This is only the tip of the iceberg though. It’s also very clearly true, albeit anecdotally (but I’m sure some kind soul has access to the data to support it if necessary), that the shorter the time controls, the more the weaker player struggles with a handicap that would be fine at longer controls. playing 20 secs per move compared to a correspondence game massively changes the likelihood of the two results in high handicap game.

Heuristically working out exactly what an appropriate handicap is at different ratings and time controls is something that a site like this can do with a very large sample of games (and it can steadily update it - games played between people with a “correct” handicap of 3 stones where White wins 80% of the time implies that the handicap is not large enough on the current formula etc) … however, it’s actually amazing how small your samples become when you have to restrict it to “well, 3 stones at 8-10k”, and “provided a time control between X and Y”, and suddenly you end up with a sample of 20 games over 10 years for that very specific subset of circumstances, and it becomes harder and harder to call your data reliable.

So I’m not completely convinced that ranks can ever directly give you a handicap that leads to a guaranteed nice close game. Handicaps are a great feature, and I’m very happy using rank as a general rule of thumb to start with (although its surprisingly hard when I know neither my rank nor my opponent’s! ) , but they will never be much better than a “decent approximation”.

Points and distributions

Actually, managing a rating system into ranks is fundamentally an immensely complicated problem. Firstly, any ELO / Glicko system creates a bell shaped distribution of its player base. The more people you have, the more the top and bottom stretch (and when starting this site with only a very few players, it was impossible to have sensibly distributed ranks for some time). If you decide to have a hard point of 2100 = 1 dan, 2000 = 1 kyu, with a rank per 100 points going up and down from these values, then even if you start with people declaring specific ranks from a specific source, time and increasing populations create a large amount of drift. Increase your player base by 100-fold and have every player play 10,000 games against every other player, and the 30th percentile will a lot lower ranked and your 80th percentile will be a lot higher ranked than at the beginning. Is this a problem? Not in itself, it depends why you want ranks in the first place. I strongly suspect that it will mean a handicap stone slowly becomes the equivalent of more and more rating points as the playerbase’s rating stretches, but you can easily offset that by making the ranks stretch to match - as long as you have anchors somewhere.

KGS is well known (or at least was) for having a bunch of unknown players to anyone other than wms that were used to “anchor” the system, who had very stable ranks and played a lot of games. Which means that over time, regardless of how they were doing, unless they stopped being an anchor, their rank would always be the same, as they were used to adjust the distribution of the whole system. Sounds weird right? But it’s actually necessary if you want to feel confident in your ability to tie the ranking distribution of your site/server with an external list that people might want to compare themselves to (be it the AGA or the EGF or whatever).

The EGF’s rating formula literally injects points into people’s ratings every time they play a rated game. This is because overall the increase in strength in the playerbase from people getting better was found to be greater than the loss of rating points caused by strong players becoming inactive. Injecting points was the only way to prevent rating deflation (someone joins as 18 kyu, rapidly improves over the next year to 6 kyu, and pulls 12 ranks of rating points out of the system, assuming the opponents didn’t overall all get worse over that time). In their defence, the EGF do allow people to “reset” their ranks if they have become a lot stronger to mitigate this, but depending on the national federation this can require quite a bit of evidence and persuasion to allow it. The BGA (British Go Association) was somewhat strict about this, but there was a very amusing period of time where the French Federation of Go basically flat out refused it for a while (I assume it’s no longer the case), which led to great hilarity when French 1 kyus would come to the UK and batter our 3 dans around. They had experienced a great boom in Go popularity, and suddenly had a very large population of rapidly improving players who weren’t having their ranks increase to match their increase in playing strength, and it had the expected results.

Trying to put “100% confidence” that you have a specific rank that you can tell someone in a Go club in the US, the UK, France, Israel, Russia, Korea and Japan and have them all know how strong you are by your proclaimed grade is, IMO, an unachievable task. I don’t see any value in striving to achieve this.

Why ranks anyway then?

I’m currently playing in an online local club, and giving my two regular opponents there 6 and 9 stones. It was 5 stones for the first player originally, but we have a rule that 3 results in the same direction in a row, and the handicap moves. It doesn’t bother me what rank either of us are, if he keeps winning, it gets smaller, if he keeps losing, it gets bigger. Ideally we end up in a situation where the games are regularly very close. To me, this is the purpose of handicaps, rather than assuming we are creating 50/50 games with random people who have never played before.

Ranks are all over the place as I understand it in the Korean amateur circuit (although I am 100% taking all of this from 3rd party anecdotes, as I’ve never been there and rely on people who relate their experiences publicly). Most Go salons apparently run from 1 gup to 18 gup (basically translates to kyu). A 1 gup could be anywhere from EGF 6 dan to EGF 1 kyu, as they are largely internally maintained grades between salon members, and are really a salon tracking system rather than anything that expects to transfer elsewhere (quickest interesting discussion link I can find is here: https://senseis.xmp.net/?RankGupKoreanExplained, but it’s far from complete)

However, the one running feature that all ranking systems seem to have in common (and having helped regularly for a while in a Children’s Chess and Go club where I was giving some players 13 stones on 13x13 boards, I would agree that this applies just as much to that strength of player as it does at any other), is people like to use them to track their progress and improvement, even just within the narrow confines of their regular opposition. If people would like to improve their visually displayed rank when they have a string of good results, it makes no sense to me to have a -300 rated player (and yes, ELO and Glicko will happily support ratings going below 0) with the same rank as a 900 rated player.

I seem to be perhaps one of the few people who truly has no interest in their rank. I suppose having held 2 dan for a while on KGS yet never getting above 1 kyu on IGS I’ve realised that all the rank really is is a layer of abstraction over a hidden value used to track your results.

If people want to play regularly in any given place (online or offline), and would like to see positive results reflected in an improvement of their rank / grade (or any other display of a converted rating number), then it makes sense to have a rank system that covers every single player on your playerbase from the best to the worst, with the ability to go beyond both extremes if players show performances the would push the boundaries in either direction.

If “fully implementing” implies a requirement of a fully accurate handicapping system, play it by ear. Seed some values for “appropriate rating points difference per stone handicap”, and adjust it if it’s leaning towards either colour winning a disproportionate number of games. But don’t ignore the other reasons to have a rank display other than just to create competitive handicap games.

gennan · July 1, 2020, 7:04am

This spreading may be countered by adding some correction points for each game. That’s not part of the Elo or Glicko system, but from analysis one could determine how large this correction should be to counter it.

In a way, declared ranks can be seen as an achor of the EGF system. Individually, people will fluctuate and there will be variation in strength. But when you take the average rating of a large group with the same declared ranks, the overall shape of that for all ranks may serve as a group of anchors. Or it may at least be a means of monitoring spreading and drifting of the overall system.

Thess little point injections in the EGF system may be seen as a way to prevent downward spreading. See my first remark about adding correction points. But when these injections are increased, it even helps to compensate for the average improvement of players which would otherwise cause an overall rating deflation or downward rating drift. Determining the improvement rate for different ratings is not easy. It is reasonable to assume that a 30k improves much more quickly than an 8d. But for more exact details, you need to analyse a lot of data.
As I’ve been analysing the EGF data for some time, I have gotten a fairly good impression how it behaves, but for OGS it’s probably quite different (if only because OGS has a history of 25,000,000 games and the EGF only has 1,000,000 games. On the other hand, the EGF data is better annotated with declared ranks and it has a longer history of 24 years).

Indeed the upward reset mechanism is quite important to mitigate deflation of the EGF system (perhaps even more important than these little point injections).
In my simulations for EGF system modifications, I get a pretty strong deflation without this mechanism. But its effectiveness depends a lot on local policies. Things can still go awry if resets are not applied in the “right” amount. When upward resets are shunned too much you get anomalies like you describe. But when upward resets are applied to much, you get inflation.
If possible, I think it would be better if the system makes it “right” automatically to remove this dependency on somewhat arbitrary local trends and policies. I suppose the Glicko2 system is better suited for this. I have the impression that it has some kind of notion of variable rating “velocity” for individual players.

kingkaio · July 1, 2020, 4:45pm

Yesterday, when I was playing a game, I saw a 26k next to they’re username instead of 25k. Unfortunately I couldn’t get a screenshot because when I reloaded the page it was back to 25k.

Edit: It just happened again right before it fully loaded in this game

Lys · July 1, 2020, 10:07pm

Don’t trust things happening in the twilight zone of a page not fully loaded.

KAOSkonfused · July 2, 2020, 8:42am

…but now, @Gia seems to be 24k. Congratulations!

Gia · July 2, 2020, 8:47am

Eh, just an unfortunate time out opponent. I’m still me.

Haze_with_a_Z · July 2, 2020, 5:08pm

You are you one rank better.

Paul_Smith · August 10, 2020, 4:21pm

I can see a lot of discussion on this topic but I can’t easily see what the conclusion was if any.

I have found it frustrating for some time that children who play in our club, and have a rating which may go down to 35-kyu, find that in various places outside the club ratings weaker than 20 or 25 kyu do not exist. I feel it gives the message that they are so weak that their grade is not worth considering (whereas, as I think some in this thread have commented, to a beginner a real 25-kyu is very strong!).

Now the EGF are fixing their system by changing the floor from 20-kyu to 30-kyu, I think it would be great if ratings below 25-kyu were also recognised/implemented here.