Glicko entry points and new players

shinuito · October 15, 2023, 1:15am

I came across such a person today also, and they’re quite good at other games but just seeing and being paired with other players that seem way too good was offputting enough to put go on the backburner.

I guess that surprised me a little, more than it should maybe given the topic is already under discussion here and previously several times.

Uberdude · October 15, 2023, 7:25am

Us Go players know Go is a great game, and it’s worth enduring the shitness that is the OGS beginner experience to then later be able to play beginners and start having fun. But those beginners don’t know that, and even if someone tell’s them “It’s worth it, just lose ~5 games to get a rank” (popular reddit post) they’ve already had their motivation dented, perhaps too much.

GreenAsJade · October 15, 2023, 7:28am

It it was just that, I wouldn’t be as motivated to keep exploring till we get a fix. But in addition to that, there’s the problem of the existing players’ experience, getting matched with beginners … abandonning, driving the beginner rank up etc etc. It definitely needs a fix…

gennan · October 15, 2023, 8:35am

If there are worries that having new accounts self-declare new/beginner/intermediate/advanced may cause drift, there are ways to investigate such potential issues and also test potential fixes.

In particular, it helps a lot to run simulations with historical data.

Take some time period of historical OGS game results (say the last year or two, maybe longer if that’s doable/desirable), initialise all newly created accounts in that period by picking the entry point closest to where they were after playing some 15 ranked games (instead of the currently used 6k entry point). Then run all the historical results through the simulation and see if the simulated rating distribution drifts away from the actual rating distribution.
You can run several simulations, like one where everybody picks the “perfect” entry point, one where many pick an entry point randomly, one with many sandbaggers (many picking their entry point too low), etcetera. Then you’ll know what would be the effect of those things and you may learn at which point the system starts to break.

If a simulated ratings evolution drifts significantly from the actual ratings evolution, you can fiddle with various things to fix that. Like add or subtract a 0.1 rating point per player per time period (a day, a week, a month), depending on how much drift needs to be compensated for.
Or you may change the initial ratings associated with each entry point, and rerun simulations to see if it helps.

With simulations like these, you can experiment with various rating system tweaks and predict their impact.
I used simulations like these all the time when I was involved in the preparation of the 2021 EGF rating system update. Every simulation ran through the full history of game results in the EGD (1 million game results over a 25 year period) and took some 5 minutes to run on my laptop.
I think it’s a good way to ensure that some combination of changes won’t break things (assuming the composition/demographics/behaviour of the player base won’t change greatly in the future).
These simulations helped greatly to overcome concerns that lowering the EGF rating floor might cause overall rating drift. And thus it lead to the acceptance of lowering the EGF rating floor from 20k to 30k (which had been proposed several times before to the EGF, but was always rejected due to such concerns).

[Aldo Podavini, who was the EGF rating system manager at the time, gave me a tip to first have my simulator reproduce the actual EGF ratings (as a base point) and start from there. That was a very good tip, as it uncovered some bugs in my initial version.]

So I’d recommend doing many simulations before implementing changes to the production rating system. In fact, I wouldn’t be surprised if anoek has already used such simulations in the past when preparing/testing changes to the OGS rating system.

alemitrani · October 15, 2023, 8:58pm

This reminded me of GoQuest (Go Quest (9x9) - Play Free Online Game Of Go) which might be a useful contrasting case. GoQuest starts everyone at the bottom in terms of rank but the real rating is based on the ELO system and is different from the displayed rank until you get to a certain level. You just have to work your way up but it is fun doing so because you get regular promotions and nice certificates every time you go up a level. Perhaps more importantly the interface is simple and there are lots of people who play live so it is quick and easy to get a game. I remember that OGS was a little intimidating when I was a beginner, but not as much as Tygem, Panda or KGS. For beginners I would definitely recommend GoQuest because the emphasis is on playing quick live games and having fun.

Would the proposed changes make it easier or harder for beginners to get live games on OGS? Would waiting times go up or go down?

square.defender · October 15, 2023, 9:25pm

Go has tradition where 9 dan is max possible human rank. And 2 ranks difference is 2 stones handicap.
GoQuest system only confuses everyone. Then if someone played GoQuest only, they would choose wrong rank on other server or on real Go club.

Groin · October 15, 2023, 10:58pm

I never care on the rank on goquest, only the rating matters there to me.
There are very few 19x19 played, but quite a good quantity of strong players playing there regularly on 13x13 (rated over 2400).
Starting from the bottom you may feel being sandbagging although the pairing will let you play quickly at your level even if your rating don’t follow up.
There are beginners around and weak bots.
The simplicity has its own defects like the autoscoring and absence of moderators which can become problematic with half finished games of beginners.

Devin_Fraze · October 16, 2023, 5:24am

I understand the problem. But just stating it and giving the same explanation I’ve read on the forums for years doesn’t solve the problem.

When I say fix them math, I mean, come up with a solution that makes a better experience without breaking the math.

Here is an idea…
To avoid rank drift, have new players games effect only the new player rating. and don’t let games against other new players effect rating.

For example, let someone enter as 20k or 5d, but when they play a match, have the math calculate their new rating, but don’t change the rating of the opponent.

Eventually, when the rating reaches an appropriate confidence interval, then let rated games count as they normally do.

I’m not sure if that’d work, but coming up with a creative solution that doesn’t break the rating system is needed.

jlt · October 16, 2023, 5:51am

Starting at a wrong rank like on GoQuest is fine since 9x9 games are quickly finished. Same for Lichess where the starting Elo is 1500 for everyone, a chess game is quicker than a 19x19 go game. On OGS it’s even worse for people who only play correspondence games, their rank takes months to adjust.

Feijoa · October 16, 2023, 6:08am

Beginners should (be allowed to) play ranked 5x5 games for those initial losses, until they can win reliably as Black.

Feijoa · October 16, 2023, 6:13am

Also, aside from changing the ranking formula, couldn’t we at least allow new players to have ranked games with established 25 kyus? I don’t think that’s a formula thing, just a matchmaking restriction.

martin3141 · October 16, 2023, 7:04am

I believe that playing strength/style is more complicated than what we can represent with just one number. Has anybody tried to make a multidimensional rating system yet?

shinuito · October 16, 2023, 8:51am

Glicko 2 has at least three numbers, rating, deviation and volatility for each player, roughly telling you the players estimated average strength a confidence in that strength ( a window of values around it they could fall in) and how much they have unexpected results. Some of the bots for instance have very high volatility.

Noob4Ever · October 16, 2023, 11:22am

I have no idea at all how Glicko works at all, but what at the end what Chess.com does is balance using something similar to dichotomous jumps. Imagine you are a real 1200 elo player.

He starts ranking with say 1500, if you win, you gain multiplied ELO (not sure if this multiply comes from glicko formula or what), and then paired with a 1800 or so. If you then loose, you loose ELO, but not 15 or so, like 150 (again, multiplied)and be paired again against lower player, and your ELO is jumping several games and paired equal until you ELO is stablished, and multiper is decreasing. Maybe is how OGS does, I don’t know, but starting point shouldn’t be an issue with this kind of system

jlt · October 16, 2023, 11:28am

I think that’s more or less what OGS does. However losing 4-5 games in a row in go is more painful than 4-5 games in chess because games are longer, unless they play 9x9 instead of 19x19.

UrbanSpirit · October 16, 2023, 12:15pm

It seems to me like this discussion is going in circles. Letting beginners choose a lower starting rank is a better experience for beginners. Which is what we want. There are concerns how that effects the ranking system over all, but there are theoretically ways to compensate (e.g. subtracting a small amount of rating per game). This would require a lot of work to analyze the impact and work out what to do.

The only question left is whether that work would be worth it. I’d say that that is for the people actually implementing it to decide.

UrbanSpirit · October 16, 2023, 12:21pm

Forgot to ask, is the game result data publicly available to play around with?

benjito · October 16, 2023, 1:48pm

There are a couple databases available -

about 1GB of game data in the online-go/goratings repo: https://github.com/online-go/goratings/tree/master/data
27 million game SGF dump: Can we get an SGF database dump?

Of course, game and rating information is generally available via the API too, but it’s pretty inefficient and rate limits can be cumbersome

benjito · October 16, 2023, 2:12pm

Okay not trying to throw another feature suggestion in with an already massive pool, but I think it would be good to guide raw beginners toward small boards anyway

With the proposed “how strong are you?” prompt, it will be pretty simple to pick out the beginners

I think we should try to dig this analysis up if it’s going to be the bar to which we hold ourselves.

It’s fair to expect rigor in any change to the rating system, but its an impossible task to build an analysis that can refute “anoek ran the numbers and this is definitely a problem”

If someone does run a proper analysis, the rebuttle could still be “anoek ran the numbers better, so there must be something wrong with your analysis”. And this could be valid! Perhaps the data is not the right quality or size, but there is no way to know without some transparency into how this was approached before (assuming this analysis actually happened )

gennan · October 16, 2023, 5:55pm

That is an interesting topic to investigate, but I think it would only complicate things more.

Already as it is, people are having issues with their ranks being show with 2 extra dimensions (time settings and board size). Adding more dimensions (like playing style dimensions) will quickly get very complicated in terms of UI. Also, I think players would be quite confused when told they are 4k when playing aggressive moyo blitz games on 19x19, but 8k when playing calm territorial correspondence games on 9x9.