Expected winrates based on Glicko2 ratings

Nigule · January 17, 2025, 3:20pm

Hello,

I have read the thread regarding the current Glicko2 rating (OGS has a new Glicko-2 based rating system! [2017]) and also saw this interesting chart here: What elo is required for each rank on OGS? (is this still up to date? I saw a formula in the first link to get the Glicko2 rating from the kuy/dan rank and the coefficients are different).

My question is: given the Glicko2 ranking information (rating, deviation, volatility) of two players, what is the formula to calculate their expected win rates when they are playing each other?

I am curious to know what is the win rate between different ranks in OGS, as per the ranking system design. Or to say it differently, what is the win rate increase when player strength increases by one stone in the OGS rating system.

Thanks!

shinuito · January 17, 2025, 3:34pm

At least part of it you can play around with here

It’s technically separate from the actual rating code, and so the winrate formula for example might be outdated, compared to the backend code as noted here

Linked there in the glicko pdf is at least the win rate in Glicko, which might need to be adjusted for scale
for glicko2, but it doesn’t include the volatilities.

I don’t actually know what the proper winrate formula would be for two players, taking all variables into account.

I think you can find some examples and tables of winrates with and without handicap in older posts

benjito · January 18, 2025, 1:18am

Ignoring deviation, Glicko 2 winrates should be the same as Elo winrates.

Here is a table for Elo: I.S. 318 Chess Team

If you happen to find a more complete formula, taking deviation/volatility into account, please share here!

shinuito · January 18, 2025, 9:21am

This is the one for Glicko.

You see the expected outcome depends on the sum of the variances of the two players.

For Glicko2 the scales need to change for the ratings, so the formula would be slightly different, but as a first guess (without volatilities) it could be the above.

gennan · January 18, 2025, 11:05am

To answer that question, besides the rating calculations, you also need a conversion formula to convert between ratings and go ranks. That conversion formula basically determines the rating gaps between go ranks. It’s mostly independent of the rating system, as long as rating gaps in the rating system are compatible with the Elo scale.

That conversion formula is applied on top of the rating system. OGS has their own empirically determined conversion formula, and it is similar to the EGF empirically determined conversion formula (at least in the kyu range).

Also see:

From those conversion formulas (ignoring the g function and RD parameters of Glicko):

a 15k is expected to win 56% of their even games against a 16k on OGS and EGF
a 1d is expected to win 62% of their even games against a 1k on OGS and 64% on EGF
a 7d is expected to win 66% of their even games against a 6d on OGS and 76% on EGF

Nigule · January 18, 2025, 3:18pm

Thanks, that the information I was looking for. This is enough for me to answer my curiosity regarding strength increase between each ranks.

There is a formula, I got it by asking ChatGPT, but in fact it is described in the original glicko2 paper. But it was somewhat impractical for me because it requires a deviation and a volatility, and I wasn’t sure what to pick since I mostly interested by win rates between a rank and the next one. I had tried with different values but my results were doubtful (always very close to 50%). See the post below for the actual formula. But for me, using the Elo formula will work just fine.

Nigule · January 18, 2025, 3:40pm

Thanks for the link, I had somehow miss it during my forum search.

Perfect, thanks!

Counting_Zenist · January 21, 2025, 6:34pm

Just out of curiosity, is there a bottom of the Glicko2 rating in the OGS implementation? The calculator bottoms out at 100.

shinuito · January 21, 2025, 7:02pm

I believe there is yes

github.com

online-go/goratings/blob/6cab3090a54c8b876ac6a8d0886dc9124dadd575/goratings/math/glicko2.py#L17


      
          EPSILON = 0.000001
          # TAO = 1.2
          TAO = 0.5
          LOSS = 0.0
          DRAW = 0.5
          WIN = 1.0
          MAX_RD = 500.0
          MIN_RD = 30.0
          MIN_VOLATILITY = 0.01
          MAX_VOLATILITY = 0.15
          MIN_RATING = 100.0
          MAX_RATING = 6000.0
          PROVISIONAL_RATING_CUTOFF = 160.0
          GLICKO2_SCALE = 173.7178
          AGING_PERIOD_SECONDS = None
          
          
          class Glicko2Entry:
              rating: float
              deviation: float
              volatility: float

min 100 and max 6000

github.com

online-go/goratings/blob/6cab3090a54c8b876ac6a8d0886dc9124dadd575/goratings/math/glicko2.py#L189


      
          
              # step 6
              phi_star = sqrt(player.phi ** 2 + new_volatility ** 2)
          
              # step 7
              phi_prime = 1 / sqrt(1 / phi_star ** 2 + 1 / v)
              mu_prime = player.mu + (phi_prime ** 2) * delta_sum
          
              # step 8
              ret = Glicko2Entry(
                  rating=min(MAX_RATING, max(MIN_RATING, GLICKO2_SCALE * mu_prime + 1500)),
                  deviation=min(MAX_RD, max(MIN_RD, GLICKO2_SCALE * phi_prime)),
                  volatility=min(0.15, max(0.01, new_volatility)),
                  timestamp=timestamp,
              )
              return ret
          
          
          def glicko2_configure(tao: float, min_rd: float, max_rd: float, aging_period_days: float) -> None:
              global TAO
              global MIN_RD

The max will make sure if the rating drops below the min of 100 you just take 100 as the new rating. The min makes sure you take the max value of 6000 if you go over that with the update.

If you would update to 90 for example you’d update to min(6000, max(100,90 ) ) =100

If you would update to anything in between like 1400, then it gets left alone min(6000, max(100,1400))=1400

Counting_Zenist · January 21, 2025, 7:38pm

This is interesting. And system having hard caps can lead to some very interesting outcomes where only one side changes the rating. (although not practical and I doubt there are accounts close to these boundaries, however, the loopholes are there).

I wonder if there is no bottom, would glicko2 eventually turn one side into 0 if they kept losing?

shinuito · January 21, 2025, 8:08pm

I don’t think Glicko really cares about negative numbers. As far as I know you could make all the starting ranks at -1000 and it shouldn’t work any different (unless there’s some convergence issues in the algorithm - the rating update is a little more complicated in Glicko2)

With Elo, you can have have some at arbitrary negative values of ratings.

I feel like if you had a system of gradually worse random playing bots for example, maybe you could assign meaning to very low very negative ranks.

I think even players completely new to the game, would never be as bad as truly random bots that genuinely don’t understand the rules.

Samraku · January 21, 2025, 10:07pm

Arimaa did this. They played a bunch of weak bots against eachother until they were fairly accurately ranked relative to eachother, and then defined 0 as the ranking of the bot which plays a random legal move every turn. A complete beginner in Arimaa is around 1000 elo on that scale

shinuito · January 21, 2025, 10:17pm

A lot of systems like Elo are translationally invariant so you could make them start at 0 or -1500, and it shouldn’t make too much difference. Basically whenever the update only depends on the rating difference and not the rating of both players of the player itself.

However, make a move uniformly at random, might not even be the worst someone could play.

For example you can make random distributions, where like a bot in Go picks a random move along the first line or second line with high probability when it’s legal, and then picks higher line moves at random when needed. Such a bot, heuristically at least, would make smaller territory on average than a bot that took the centre or other areas.

I would expect certain random distributions could beat each other consistently enough to have rating gaps between them.

Of course in Go, you kind of need bots not to play suicide moves, and can at least pass with non-zero probability. So probably one should only compare bots that don’t kill their groups by filling eyespace, and use area scoring rules.

Samraku · January 21, 2025, 10:21pm

Yes, simply multiplying the evaluation function of a strong bot by -1 would make a far worse player than random play, but as random play is already far weaker than complete beginners, it makes a good “practical worst play” since it doesn’t shift strength whenever bots improve