Recognising and fully implementing ranks beyond 25 kyu

2 ranks can be a large gap.

not really

Good idea but some 25ks wonā€™t know so it is better to automatically invite them so they can see it and have a higher chance for them to join also I suggest that to be in simultaneous McMahon so more 25ks could join

3 Likes

It could also be an elimination tournament. The same number of people can join those as a simultaneous mcmahon, and then you could easily have the tournament be live or correspondence or both. It can be slightly hard to play multiple games at once, especially that many if you are not used to it, if it is decided to be a live tournament. I think that double elim would be the best because it is a style used for some live ASTs, and(in my opinion) it has worked well for us. Swiss is weird because I feel like you never know when it will end, and single elimination works sometimes, but it is a bit frustrating to lose once and be eliminated, and if it was just a fluke or something, a win that doesnā€™t usually happen, you arenā€™t left with false data that says some of them just arenā€™t good, and the rank difference is unneeded, because they played one bad game.

I will point out that some 25ks might normally play 19x19, while others play 9x9 and they might not be as good at the other sizes. You could solve this problem by having a few tournaments for them with different sized boards.

3 Likes

There should be all ranks

3 Likes

I do think that human ranks and bot ranks should be separated with each other. For example, the latest version of Leela v- 3200 would be about 15-16D without deliberating it with the flying knife joseki, and Katago 40b 60k visit is about 19D. Even the top pros (who would arguably reach an elo of 11-12 D) would probably lose to a leela v 1600. Thus, it seems impossible to connect human players with bots.

4 Likes

This doesnā€™t seem to be the right thread for that discussion since this is supposed to be about ranks lower than 25 kyu. (However in principle it could be a separate rank like the pro ranks, but that could depend on whether you want bots to play ranked games or not. You also have to think about that there are bots playing at all rank ranges not just 9d+ bots.)

Also the discussion more or less stalled out in this thread.

3 Likes

Did we implement ranks below 25 kyu? I see an opponent momentarily as 28k (their card at top right in our game) and then it shows 25k again.
Should I party or itā€™s a feature?

6 Likes

I couldnā€™t confirm, although I was thinking that bit could be a bug. I have had it show me that before, some of the times were before this thread. I would guess it is a bug, although they certainly couldā€™ve implemented too.

5 Likes

I agree with @Gia and @Haze_with_a_Z I see that sometimes.
@anoek Is it a bug? Just curious.

2 Likes

Iā€™m gonna go with it probably wasnā€™t implemented. Looking at the games page https://online-go.com/observe-games and going to the very last or near last page, of live or correspondence, seems to show 25kyu players. I think some of their ratings graphs make it look like (some of) their rating(s) are a good bit below the 25kyu bar.

5 Likes

Crossposting this here

since it has an update to do with ranks beyond 25kyu in the form of data and suggests discussion about maybe adding ranks from 25kyu-30kyu but not setting an automatic handicap in this range.

5 Likes

My comments here may be redundant after the latest rank tweak post, but unfortunately I read a thread about ranks, ratings, and a small red button gets pressed somewhere in my brain. Iā€™m not sure if I have the willpower not to dive in. So starting with an apology for the upcoming wall of text, here goes ā€¦

(and with a caveat that I stopped caring about my rank a very long time ago. I have no idea what I am, and Iā€™m not bothered by it - Iā€™ve even turned off any form of rank display on this site, which I think is a particularly awesome OGS feature)

TL;DR ā€“ If people want to use their rank to monitor their progress, have no limit. A 9d getting stronger should feel like hitting 10d is a logical demonstration of their progression. There is no reasonable reason to hard limit an upper or lower bound to rank. The complications arise when trying to work out sensible handicaps, but OGS already has a non-linear relationship between rating differences and handicaps anyway from what Iā€™ve seen, so just extend the same system you already have - if itā€™s slightly off, thatā€™s fine, it can be tweaked over time. As it appears handicap is actually worked out by different rating point differences at different levels, it isnā€™t impacted by rank anyway - rank is just a visual representation of the handicap equivalence. ā€“ END

Below is the myriad of (far too) long musings that lead me to my summary above. Donā€™t bother reading it unless you have a nice drink to hand and a bundle of spare time. It will be incredibly boring to a lot of people I suspect.

  1. All handicap stones are equal, but some are more equal than others

Firstly, the question is hard to answer without recognising what the purpose is to the system, and if you asked a large sample of OGS members, youā€™d probably get a very wide range of different answers. If a standard-ish Glicko formula for updating ratings based on results is used, and you make the argument that ā€œranks only serve to balance games between players of different skills by applying appropriate levels of handicapā€, then not every 100 rating points represents the same optimal handicap. I do broadly agree that the stronger you are, the more significant a single stone of handicap is. This is relatively logical, the stronger the player, the more understanding that player has to maximise the value of their stone. So a 100 rating point difference at 2000 is not going to be the equivalent to a 100 rating point difference at 700.

This is only the tip of the iceberg though. Itā€™s also very clearly true, albeit anecdotally (but Iā€™m sure some kind soul has access to the data to support it if necessary), that the shorter the time controls, the more the weaker player struggles with a handicap that would be fine at longer controls. playing 20 secs per move compared to a correspondence game massively changes the likelihood of the two results in high handicap game.

Heuristically working out exactly what an appropriate handicap is at different ratings and time controls is something that a site like this can do with a very large sample of games (and it can steadily update it - games played between people with a ā€œcorrectā€ handicap of 3 stones where White wins 80% of the time implies that the handicap is not large enough on the current formula etc) ā€¦ however, itā€™s actually amazing how small your samples become when you have to restrict it to ā€œwell, 3 stones at 8-10kā€, and ā€œprovided a time control between X and Yā€, and suddenly you end up with a sample of 20 games over 10 years for that very specific subset of circumstances, and it becomes harder and harder to call your data reliable.

So Iā€™m not completely convinced that ranks can ever directly give you a handicap that leads to a guaranteed nice close game. Handicaps are a great feature, and Iā€™m very happy using rank as a general rule of thumb to start with (although its surprisingly hard when I know neither my rank nor my opponentā€™s! :smiley: ) , but they will never be much better than a ā€œdecent approximationā€.

  1. Points and distributions

Actually, managing a rating system into ranks is fundamentally an immensely complicated problem. Firstly, any ELO / Glicko system creates a bell shaped distribution of its player base. The more people you have, the more the top and bottom stretch (and when starting this site with only a very few players, it was impossible to have sensibly distributed ranks for some time). If you decide to have a hard point of 2100 = 1 dan, 2000 = 1 kyu, with a rank per 100 points going up and down from these values, then even if you start with people declaring specific ranks from a specific source, time and increasing populations create a large amount of drift. Increase your player base by 100-fold and have every player play 10,000 games against every other player, and the 30th percentile will a lot lower ranked and your 80th percentile will be a lot higher ranked than at the beginning. Is this a problem? Not in itself, it depends why you want ranks in the first place. I strongly suspect that it will mean a handicap stone slowly becomes the equivalent of more and more rating points as the playerbaseā€™s rating stretches, but you can easily offset that by making the ranks stretch to match - as long as you have anchors somewhere.

KGS is well known (or at least was) for having a bunch of unknown players to anyone other than wms that were used to ā€œanchorā€ the system, who had very stable ranks and played a lot of games. Which means that over time, regardless of how they were doing, unless they stopped being an anchor, their rank would always be the same, as they were used to adjust the distribution of the whole system. Sounds weird right? But itā€™s actually necessary if you want to feel confident in your ability to tie the ranking distribution of your site/server with an external list that people might want to compare themselves to (be it the AGA or the EGF or whatever).

The EGFā€™s rating formula literally injects points into peopleā€™s ratings every time they play a rated game. This is because overall the increase in strength in the playerbase from people getting better was found to be greater than the loss of rating points caused by strong players becoming inactive. Injecting points was the only way to prevent rating deflation (someone joins as 18 kyu, rapidly improves over the next year to 6 kyu, and pulls 12 ranks of rating points out of the system, assuming the opponents didnā€™t overall all get worse over that time). In their defence, the EGF do allow people to ā€œresetā€ their ranks if they have become a lot stronger to mitigate this, but depending on the national federation this can require quite a bit of evidence and persuasion to allow it. The BGA (British Go Association) was somewhat strict about this, but there was a very amusing period of time where the French Federation of Go basically flat out refused it for a while (I assume itā€™s no longer the case), which led to great hilarity when French 1 kyus would come to the UK and batter our 3 dans around. They had experienced a great boom in Go popularity, and suddenly had a very large population of rapidly improving players who werenā€™t having their ranks increase to match their increase in playing strength, and it had the expected results.

Trying to put ā€œ100% confidenceā€ that you have a specific rank that you can tell someone in a Go club in the US, the UK, France, Israel, Russia, Korea and Japan and have them all know how strong you are by your proclaimed grade is, IMO, an unachievable task. I donā€™t see any value in striving to achieve this.

  1. Why ranks anyway then?

Iā€™m currently playing in an online local club, and giving my two regular opponents there 6 and 9 stones. It was 5 stones for the first player originally, but we have a rule that 3 results in the same direction in a row, and the handicap moves. It doesnā€™t bother me what rank either of us are, if he keeps winning, it gets smaller, if he keeps losing, it gets bigger. Ideally we end up in a situation where the games are regularly very close. To me, this is the purpose of handicaps, rather than assuming we are creating 50/50 games with random people who have never played before.

Ranks are all over the place as I understand it in the Korean amateur circuit (although I am 100% taking all of this from 3rd party anecdotes, as Iā€™ve never been there and rely on people who relate their experiences publicly). Most Go salons apparently run from 1 gup to 18 gup (basically translates to kyu). A 1 gup could be anywhere from EGF 6 dan to EGF 1 kyu, as they are largely internally maintained grades between salon members, and are really a salon tracking system rather than anything that expects to transfer elsewhere (quickest interesting discussion link I can find is here: https://senseis.xmp.net/?RankGupKoreanExplained, but itā€™s far from complete)

However, the one running feature that all ranking systems seem to have in common (and having helped regularly for a while in a Childrenā€™s Chess and Go club where I was giving some players 13 stones on 13x13 boards, I would agree that this applies just as much to that strength of player as it does at any other), is people like to use them to track their progress and improvement, even just within the narrow confines of their regular opposition. If people would like to improve their visually displayed rank when they have a string of good results, it makes no sense to me to have a -300 rated player (and yes, ELO and Glicko will happily support ratings going below 0) with the same rank as a 900 rated player.

I seem to be perhaps one of the few people who truly has no interest in their rank. I suppose having held 2 dan for a while on KGS yet never getting above 1 kyu on IGS Iā€™ve realised that all the rank really is is a layer of abstraction over a hidden value used to track your results.

If people want to play regularly in any given place (online or offline), and would like to see positive results reflected in an improvement of their rank / grade (or any other display of a converted rating number), then it makes sense to have a rank system that covers every single player on your playerbase from the best to the worst, with the ability to go beyond both extremes if players show performances the would push the boundaries in either direction.

If ā€œfully implementingā€ implies a requirement of a fully accurate handicapping system, play it by ear. Seed some values for ā€œappropriate rating points difference per stone handicapā€, and adjust it if itā€™s leaning towards either colour winning a disproportionate number of games. But donā€™t ignore the other reasons to have a rank display other than just to create competitive handicap games.

12 Likes

This spreading may be countered by adding some correction points for each game. Thatā€™s not part of the Elo or Glicko system, but from analysis one could determine how large this correction should be to counter it.

In a way, declared ranks can be seen as an achor of the EGF system. Individually, people will fluctuate and there will be variation in strength. But when you take the average rating of a large group with the same declared ranks, the overall shape of that for all ranks may serve as a group of anchors. Or it may at least be a means of monitoring spreading and drifting of the overall system.

Thess little point injections in the EGF system may be seen as a way to prevent downward spreading. See my first remark about adding correction points. But when these injections are increased, it even helps to compensate for the average improvement of players which would otherwise cause an overall rating deflation or downward rating drift. Determining the improvement rate for different ratings is not easy. It is reasonable to assume that a 30k improves much more quickly than an 8d. But for more exact details, you need to analyse a lot of data.
As Iā€™ve been analysing the EGF data for some time, I have gotten a fairly good impression how it behaves, but for OGS itā€™s probably quite different (if only because OGS has a history of 25,000,000 games and the EGF only has 1,000,000 games. On the other hand, the EGF data is better annotated with declared ranks and it has a longer history of 24 years).

Indeed the upward reset mechanism is quite important to mitigate deflation of the EGF system (perhaps even more important than these little point injections).
In my simulations for EGF system modifications, I get a pretty strong deflation without this mechanism. But its effectiveness depends a lot on local policies. Things can still go awry if resets are not applied in the ā€œrightā€ amount. When upward resets are shunned too much you get anomalies like you describe. But when upward resets are applied to much, you get inflation.
If possible, I think it would be better if the system makes it ā€œrightā€ automatically to remove this dependency on somewhat arbitrary local trends and policies. I suppose the Glicko2 system is better suited for this. I have the impression that it has some kind of notion of variable rating ā€œvelocityā€ for individual players.

5 Likes

Yesterday, when I was playing a game, I saw a 26k next to theyā€™re username instead of 25k. Unfortunately I couldnā€™t get a screenshot because when I reloaded the page it was back to 25k.

Edit: It just happened again right before it fully loaded in this game

1 Like

Donā€™t trust things happening in the twilight zone of a page not fully loaded.

6 Likes

ā€¦but now, @Gia seems to be 24k. Congratulations! :wink: :partying_face:

6 Likes

Eh, just an unfortunate time out opponent. Iā€™m still me. :slightly_smiling_face:

2 Likes

You are you one rank better.

8 Likes

I can see a lot of discussion on this topic but I canā€™t easily see what the conclusion was if any.

I have found it frustrating for some time that children who play in our club, and have a rating which may go down to 35-kyu, find that in various places outside the club ratings weaker than 20 or 25 kyu do not exist. I feel it gives the message that they are so weak that their grade is not worth considering (whereas, as I think some in this thread have commented, to a beginner a real 25-kyu is very strong!).

Now the EGF are fixing their system by changing the floor from 20-kyu to 30-kyu, I think it would be great if ratings below 25-kyu were also recognised/implemented here.

14 Likes