The functions get_handicap_rank_difference and get_handicap_adjustment look fine but since I don’t understand where they are used to update the Glicko rating, I can’t figure out what is wrong.
I guess something like this might be an example
We don’t see the exact backend for the OGS ratings, but maybe that’s a sample of how it might be being used?
No idea where the problem comes from. The “one game at a time” file looks correct, it says that in a game with n handicap stones, Black’s rating is updated as if he had played an even game against a player who is weaker than White by about (n-0.5) stones, and White’s rating is updated as if he had played an even game against a player who is stronger than Black by about (n-0.5) stones.
Not sure it helps, but I think some of the python code has been changed since the rating calculator conversion.
The win probably wasn’t shown anyway, but I remember that being one thing that changed.
It looks also like the handicap adjustment is different in the python code to the javascript one:
python:
doing some funny thing based on komi? Something like trying to unify 19x19 with smaller boards.
Compared to the js stuff that doesn’t do that I don’t think.
The python looks coorect, it says that a stone is worth 6 ranks on 9x9 and 3 ranks on 13x13, so the handicap adjustment for a game with 0 komi and H handicap stones is equal to
- H-0.5 on 19x19
- 6×(H-0.5) on 9x9
- 3×(H-0.5) on 13x13.
On the other hand, the js just says that the handicap adjustment is equal to H, if I understand correctly.
Yeah I think back when the rating calculator was made that code was simpler (older version below)
There was some parameters to allow for the 0.5 shift, but they were set to false at the top of the file.
The game results seemed to match what the calculator was showing.
So I suppose we would be using the new code calculation where it’s ~H-0.5 as you said if it was committed a year ago, but I’m not sure that was retroactive.
I guess we can check against new handicap game results to see that the ratings differences are off by some amount. Here’s some rated games I played a few months ago, like April/May.
| ended | game_id | played_black | handicap | rating | deviation | volatility | opponent_id | opponent_rating | opponent_deviation | outcome | extra | annulled | result |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1746195727 | 74428177 | 0 | 9 | 1811.38 | 63.42 | 0.059992 | 62763 | 1071.42 | 61.73 | 1 | null | 0 | Resignation |
| 1744982098 | 73721797 | 0 | 9 | 1806.42 | 63.28 | 0.059994 | 62763 | 1089.91 | 61.97 | 0 | null | 0 | 15.5 points |
| 1744646613 | 73570091 | 1 | 0 | 1823.89 | 63.14 | 0.059989 | 994421 | 1591.51 | 61.15 | 1 | null | 0 | Resignation |
| 1743508764 | 73883847 | 0 | 0 | 1819.01 | 62.98 | 0.059990 | 1743755 | 1871.41 | 61.27 | 0 | null | 0 | Resignation |
| 1741640412 | 73173233 | 1 | 0 | 1828.87 | 63.13 | 0.059991 | 568838 | 1654.70 | 61.78 | 1 | null | 0 | Resignation |
It looked like it worked ok for the non-handicap games. Occasionally it’s off by 1 decimal place, like 1e-2, but I’m not sure if that’s rounding or something else. I tried looking up the oppoents volatility also in the following examples.
An example with going from the third to second row, so for the 9 stone handicap where I lost as white.
I enter with the rank from row three (I believe) rating 1823.89 deviation 63.14, against the opponent in row 2, rating 1089.91 deviation 61.97, then after a loss it should update to rating 1806.42 and deviation 63.28 which is in row 2.
but the calculator says 1806.43 and deviation 63.27.
From row 2 to row 1, I enter at 1806.42 and deviation 63.28, I play against 1071.42 with deviation 61.73, white wins and it should go to 1811.38 and deviation 63.42
though instead it goes to 1811.38 and deviation 63.43 in the calculator.
So it could be that the 0.5 stone shift is changing the rating and deviaition by less than 1e-2, and that’s affecting some rounding. Or it could be that maybe the 0.5 rank shift hasn’t been implemented yet, though it’s been in the code base on github for a while.
I’m a bit tired today, so feel free to point out if I’m saying something silly.
A 0.5 rank shift would affect the rating by about 1 rating point (you can try it with the calculator, add or subtract 0.5 rank to your opponent). So your tests show that
- The rating calculator corresponds to what is implemented on the server;
- The 0.5 rank shift hasn’t been implemented. The server considers that winning as White with H1 against a 5.0k is equivalent to winning an even game against a 4.0k.
- Ignore differences of about 1e-2, these are rounding errors.
Cool, thanks for looking over it.
I just grabbed another game that finished a few mins ago, screenshot their predictions while the game was ongoing, vs after the game finished, just to have a more recent example.
This is a 2 stone game and white won.
So yeah I guess we’re waiting on the server to implement changes to match the GitHub repo, and then I or whoever can update the frontend to match that also.
I meant to come back to this also. I did find previously a mention of probability and expected outcome in glicko
It looks functionally the same but it’s evaluated at different parameters, like combined variance or standard deviation of the two players rather than just one.
Now how much that changes in moving from glicko to glicko2 I’m not sure. The rating scale is a little different I think, so that might modify the means slightly by a scale factor. The main change might be whether the volatility parameter makes a difference or is incorporated into the win probability or whether it’s only relevant in the update steps or the parameters.
I meant to make time to learn more about it, try to understand how it was derived but I hadn’t gotten around to it.



