My question is: given the Glicko2 ranking information (rating, deviation, volatility) of two players, what is the formula to calculate their expected win rates when they are playing each other?
I am curious to know what is the win rate between different ranks in OGS, as per the ranking system design. Or to say it differently, what is the win rate increase when player strength increases by one stone in the OGS rating system.
It’s technically separate from the actual rating code, and so the winrate formula for example might be outdated, compared to the backend code as noted here
Linked there in the glicko pdf is at least the win rate in Glicko, which might need to be adjusted for scale
for glicko2, but it doesn’t include the volatilities.
I don’t actually know what the proper winrate formula would be for two players, taking all variables into account.
I think you can find some examples and tables of winrates with and without handicap in older posts
You see the expected outcome depends on the sum of the variances of the two players.
For Glicko2 the scales need to change for the ratings, so the formula would be slightly different, but as a first guess (without volatilities) it could be the above.
To answer that question, besides the rating calculations, you also need a conversion formula to convert between ratings and go ranks. That conversion formula basically determines the rating gaps between go ranks. It’s mostly independent of the rating system, as long as rating gaps in the rating system are compatible with the Elo scale.
That conversion formula is applied on top of the rating system. OGS has their own empirically determined conversion formula, and it is similar to the EGF empirically determined conversion formula (at least in the kyu range).
Also see:
From those conversion formulas (ignoring the g function and RD parameters of Glicko):
a 15k is expected to win 56% of their even games against a 16k on OGS and EGF
a 1d is expected to win 62% of their even games against a 1k on OGS and 64% on EGF
a 7d is expected to win 66% of their even games against a 6d on OGS and 76% on EGF
Thanks, that the information I was looking for. This is enough for me to answer my curiosity regarding strength increase between each ranks.
There is a formula, I got it by asking ChatGPT, but in fact it is described in the original glicko2 paper. But it was somewhat impractical for me because it requires a deviation and a volatility, and I wasn’t sure what to pick since I mostly interested by win rates between a rank and the next one. I had tried with different values but my results were doubtful (always very close to 50%). See the post below for the actual formula. But for me, using the Elo formula will work just fine.
The max will make sure if the rating drops below the min of 100 you just take 100 as the new rating. The min makes sure you take the max value of 6000 if you go over that with the update.
If you would update to 90 for example you’d update to min(6000, max(100,90 ) ) =100
If you would update to anything in between like 1400, then it gets left alone min(6000, max(100,1400))=1400
This is interesting. And system having hard caps can lead to some very interesting outcomes where only one side changes the rating. (although not practical and I doubt there are accounts close to these boundaries, however, the loopholes are there).
I wonder if there is no bottom, would glicko2 eventually turn one side into 0 if they kept losing?
I don’t think Glicko really cares about negative numbers. As far as I know you could make all the starting ranks at -1000 and it shouldn’t work any different (unless there’s some convergence issues in the algorithm - the rating update is a little more complicated in Glicko2)
With Elo, you can have have some at arbitrary negative values of ratings.
I feel like if you had a system of gradually worse random playing bots for example, maybe you could assign meaning to very low very negative ranks.
I think even players completely new to the game, would never be as bad as truly random bots that genuinely don’t understand the rules.
Arimaa did this. They played a bunch of weak bots against eachother until they were fairly accurately ranked relative to eachother, and then defined 0 as the ranking of the bot which plays a random legal move every turn. A complete beginner in Arimaa is around 1000 elo on that scale
A lot of systems like Elo are translationally invariant so you could make them start at 0 or -1500, and it shouldn’t make too much difference. Basically whenever the update only depends on the rating difference and not the rating of both players of the player itself.
However, make a move uniformly at random, might not even be the worst someone could play.
For example you can make random distributions, where like a bot in Go picks a random move along the first line or second line with high probability when it’s legal, and then picks higher line moves at random when needed. Such a bot, heuristically at least, would make smaller territory on average than a bot that took the centre or other areas.
I would expect certain random distributions could beat each other consistently enough to have rating gaps between them.
Of course in Go, you kind of need bots not to play suicide moves, and can at least pass with non-zero probability. So probably one should only compare bots that don’t kill their groups by filling eyespace, and use area scoring rules.
Yes, simply multiplying the evaluation function of a strong bot by -1 would make a far worse player than random play, but as random play is already far weaker than complete beginners, it makes a good “practical worst play” since it doesn’t shift strength whenever bots improve