Bot's rank fluctuations

shinuito · September 5, 2025, 3:11pm

So I have the goratings python code from the repository I linked.

I think it’s more or less working, maybe there’s some typos to fix (in my tests).

I want to sketch a few caveats and assumptions and then some test cases to see if things are behaving as expected.

Basically, I don’t think I’m using the right formulae for winning probabilities when I simulate results of games. I feel like it’s kind of important for more realistic results. Hopefully it’s not too far off, but I guess it could be another source of uncontrolled randomness, that maybe one might want to eliminate if possible.

Caveats:

I’m using a formula from the Glicko (not glicko2) paper for the expected outcome, not the one in the repository. See for example Basic rank maths questions - #26 by shinuito and Expected winrates based on Glicko2 ratings - #4 by shinuito for the discussion as to why we’d expect the expected outcome to depend on both players ratings and both players deviations.
I didn’t remember to try to convert those formulas from glicko to glicko2. As a point of note it mentions in the glicko2 paper and mentions how to do the conversion. So there might be some extra constants needed.

The rating scale for Glicko-2 is different from that of the original Glicko system. However, it is easy to go back and forth between the two scales.

With that, I used the expected outcome to simulate the results and I think I should’ve been using the probability formula. I only really noticed that detail after making the below graphs. The formulae look similar, but different in that there’s an additional sqrt involved in the combination of the standard deviations for the two players.
I feel like I need to read the more technical paper to properly understand why these two things are different in the first place unlike with Elo. But probably I should go back an make the above changes

Assumptions:

The player whos rating we track plays at a 5kyu level or a 15kyu level each game, regardless of their current rating. What that means is that the chance of winning only depends on the level they’re supposed to be playing at, not the current rating they’re at.
Bearing in mind the caveats about winning probabilites being a bit off in the simulation, hopefully it works somewhat the same to capture some details that we see.

Tests

Test case 1 - stable 5kyu (1550) only plays other stable 5kyus

Our player is 1550 with deviation 60 and volatility 0.06 (I would say stable). The values update after every game.

Opponent is always 1550, 60, 0.06.

The probablilty of winning here or the expected outcome (between 1=win and 0=loss) is only dependent on the original rank, not the current rank, since we’re assuming the player should truely be at that rank (with those initial parameters).

I think this is more or less what I see after a few runs. Sometimes the deviation goes up a bit in 300 games, once or a few time. I think it’s more or less when the player is on a winning or losing streak, but then settles as the player returns to their stable rating (1550), there or there abouts. The volatility doesn’t really change a whole lot, even with losing or winning streaks. The Expected outcome is basically the same as the rating of the player, because they always play a player with fixed rating 1550, so the winrate is more or less tied to the rating difference.

Test case 2 stable 5kyu vs random stable opponents

Our player is 1550 with deviation 60 and volatility 0.06 (I would say stable). The values update after every game.

Opponent is a random rating 1550+N(0,500), 60, 0.06.

The player plays a lot of randomly rated opponents, there’s always chance for upsets. The rating fluctuates a bit, and the deviation is a bit higher than normal, but the volatility doesn’t really increase.

I’ve shown the expected outcome for the current rating in red, and the theory one (because we’re assuming the player really is 1550 in blue). Sometimes they should have a higher chance of winning that would be predicted by their current rating (underrated), other times it should be lower (like if they’re overrated).

Anyway it’s not too crazy.

Test case 3 Player is 50-50 a 15kyu and a 5kyu vs only stable 5kyus

Our player starts at 1550 with deviation 60 and volatility 0.06. They flip a coin as to whether they will play as stable 5kyu or a stable 15kyu (1050,60,0.06), and this is used to calculate the expected chance of winning. Their values update after every game.

Opponent is always a stable 5kyu (1550, 60, 0.06)

It’s partly what I expected that the rating deflates, and the deviation goes up, but not by as much as I would’ve thought. The volatility is also not really what I expected either.

You can see the expected outcome in theory flip between a 50% chance of winning (playing as a 5kyu against a 5kyu) to basically 0% chance of winning, a 15kyu vs a 5kyu.

Another example of the same

Seems fairly similar.

and yeah definitely seems to be trending doward toward 9-10kyu.

Test case 4 Player is 50-50 a 15kyu and a 5kyu vs random stable opponents

Our player starts at 1550 with deviation 60 and volatility 0.06. They flip a coin as to whether they will play as stable 5kyu or a stable 15kyu (1050,60,0.06), and this is used to calculate the expected chance of winning. Their values update after every game.

Opponent is a random a stable player (1550 +N(0,500) , 60, 0.06)

This is a bit more what I was saying about random opponents, that there’s more chance of bigger upsets for the current rank vs the playing skill. The deviation is heading toward 80, the volalitility is actually increasing now.

Another example

I did a couple of runs for more games, but there’s no like sudden spike to 0.15 volatility, but rather it’s just a fairly slow increase in volatility over thousands of games. Something like

I did a couple more with starting at 1550 but playing as like 5 kyu and 25kyu, and 15kyu and 25kyu, and shifted the opponents to be centered to more like 25kyu level. But again, more like a slow increase of volatility and deviation maybe capping out at 80 or so.

So it might be a very long term effect that the bots have just kept increasing in volatility over time. (I say long term for ordinary human timescales, but the bots probably play thousands of games a day )