The other day I watched an 10k kid play a 9-handicap 9x9 game against a rengo team of three beginners (chess players who knew the rules but not much more). They decided to put all their handicap stones in the center kind of like this:
Ratings were updated incorrectly after wins/losses on small boards with handicap.
The first bug meant that handicap=auto gave people mismatched games. I think that’s the bug you’re referring to.
The second bug meant that ranked handicap games (with either manual handicap or handicap=auto) changed ratings really weirdly, especially if they were playing a mix of different board sizes and a mix of different size handicaps (or a mix of handicap vs. even).
This would have disproportionately affected players that are mostly playing small boards.
My experience, from running an in-person Go club in the 2010s and more recently teaching a bunch of friends how to play Go, is that while weaker players have more randomness in their performance, there is a clear and steady progression through handicaps and board sizes.
I usually start with 4 stones on a 5x5, until they understand capturing (usually 1 game, sometimes 2-3)
Then a few games with a lot of stones 7x7, until they have some confidence in scoring and have some experience with eyes (even if they don’t understand them)
Then 8 or 9 stones on 9x9, and adjust from there as they start to figure out basic tactics and how eyes work
Along the way, I adjust board size or number of stones based on whether one of us has won two games in a row, maintaining a 50% win rate.
Players progress fairly steadily down to 3-4 stones on a 9x9 (around 25-20k?), usually without any backtracking. Meaning that it’s rare they lose two games in a row, and extremely rare that they lose a third (after a stone has been added back). The exceptions are if they take a month off, or if they have unsuccessful experiments with handicap stone placement.
I.e., my experience is that their playing strength is quite effectively distinguished by handicap + board size, and thus that ranks below 25k do have “meaning”.
Sure, it should do its best, but handicaps give the rating system a lot more information. Before the second bug fix, it was using that extra information incorrectly on small boards and actively skewing ratings.
TBH, it strikes me that with the bugs fixed, we could usefully drop the displayed-rank-floor far, and just watch to find out where the actual noise-level is… extensive analysis is not needed - instead, do data collection.
It doesn’t matter if kyu range is written near those buttons or just single kyu number, but something that is clear to Go players should be written near those buttons
I made a few tests on the beta site but not many. On the other hand I played with the rating calculator. Assuming a sandbagger registers as a new player (rating 652±250, rank 25.0k±7.5, volatility 0.06) and, each time, plays with a player with same rating but with deviation 70 and volatility 0.06, and assuming the sandbagger wins each match, then his rank becomes successively 21.2k, 18.7k, 16.8k, 15.4k, 14.2k, 13.2k, 12.4k, 11.7k, 11.0k,…
The sandbagger needs 14 successive wins to become SDK. After 20 successive wins he is still only 7k.
So I would suggest to start with a much higher deviation. For instance, with initial deviation 500, and with the same algorithm, the sandbagger becomes successively 17.9k, 14.5k, 12.3k,… It takes 6 successive wins to become SDK.
I don’t know what is the optimal initial deviation. Perhaps 500 is a bit too much since a genuine beginner who wins his first match by “accident” would become 17.9k, which may be too strong. Anyway I think the initial deviation should be closer to 500 than to 250.
I think more precisely, random play is a good practical definition of a soft floor, and you can just use negative numbers below that
Completely agree, and as has been covered before, even if we had no data that there are differences in skill below 25k, the fact that we have glicko-2 ratings which correspond to lower than that should be enough to prove it
If the new entry level is around 25kyu now for new players, most likely we will see players being pushed down below 25kyu anyway, and maybe getting stuck there if there’s a certain amount of volatility of the strength of new players (which I certainly expect there to be). Probably the new entry point will put more onus on trying to differentiate players that aren’t getting above 25kyu, surely we don’t want to demotivate them either by not showing any progress.
This already happens, especially for players who drop significantly below 25k, and then start steadily improving, but don’t see their rank go up for quite a while because it’s only showing 25k
Sure but we also might get some deflation where if you’re hovering around 24.8kyu you start to drop and stay below 25kyu permanently depending on who you regularly match up with.
A 10k should have no problem winning a 9-handicap 9x9 game against a random player too. Even a 20k should have no problem doing so, so long as they focus on just filling liberties of the opponent’s stones to capture them. The team of beginners was probably much better than random still.
Truly random can’t finish game, it would fill their own eyes and self destruct. It makes sense to talk about random that don’t play inside own eyes and in Chinese rules.
They are indeed shockingly bad even when they don’t play in their own eyespace. It’s usually lucky if they survive with any groups, assuming you’re actively you trying to capture the stones. Probably if you focus on one stone/group at a time it can get some living area, or if you handicap yourself in other ways.
Losing should be difficult unless you keep self ataring, running the opponent out of legal moves and filling eyes so they have to capture you etc.
I still wonder if there aren’t tiers of skill even at the random bot level, though you have to set up some ground rules about passing resigning etc.
Whilst we can all speculate and philosophise about the pros and cons of including ranks and the wording until the cows come home, there is no substitute for real data. So I think OGS should run with the buttons as on beta now for a month, and then with ranks for a month, and then maybe ranges etc and collect stats on click rates and average number of games for new people’s ratings to reduce to some small deviation and then decide what to use long term based on what actually works best. (Or you can trial several options simultaneously with a random choice assigned to each user).
By the time you add a range, the amount of text in the “explainer” is on par with the text of the selection that you are trying to get people to choose, and it just looks “too busy”. This is even after I dropped the font-size of the explainer.
(note, I edited the picture to make sure it had the correct ranges as specified by dexonsmith, the original image had a typo)
Since “Intermediate” will start at 12k and “Advanced” will start at 2k, we would like 5k players to select “advanced”, so it’s incorrect to write a range like “Intermediate (16k-12k)” and “Advanced (2k-9d)”. I’d rather write something like
(I’m not sure what we’d do with mobile, but I guess if we agree clickable-explainer is the way to go, we’ll find something)
Actually, I guess the reason for having overlapping ranges is because we don’t mind if someone who turns out to be 2k places themselves “intermediate” or “advanced” … they can choose based on their judgement of whether they’re strong-intermediate or weak-advanced.
( … but if a consensus in favour it would form, I’d be delighted to get it out of Beta, where it has been languishing waiting for closure on this point )