Does 6.5 Komi points genuinely matter on 19x19 board?

jlt · July 9, 2025, 5:15pm

The functions get_handicap_rank_difference and get_handicap_adjustment look fine but since I don’t understand where they are used to update the Glicko rating, I can’t figure out what is wrong.

shinuito · July 9, 2025, 5:19pm

I guess something like this might be an example

github.com/online-go/goratings

analysis/analyze_glicko2_one_game_at_a_time.py

6cab3090a


      
          
          if game.white_manual_rank_update is not None:
              self._storage.set(game.white_id, Glicko2Entry(rank_to_rating(game.white_manual_rank_update)))
          
          if should_skip_game(game, self._storage):
              return Glicko2Analytics(skipped=True, game=game)
          
          black = self._storage.get(game.black_id)
          white = self._storage.get(game.white_id)
          
          updated_black = glicko2_update(
              black,
              [
                  (
                      white.copy(get_handicap_adjustment("white", white.rating, game.handicap,
                                                          komi=game.komi, size=game.size,
                                                          rules=game.rules,
                              )),
                      game.winner_id == game.black_id,
                  )
              ],

We don’t see the exact backend for the OGS ratings, but maybe that’s a sample of how it might be being used?

jlt · July 9, 2025, 8:46pm

No idea where the problem comes from. The “one game at a time” file looks correct, it says that in a game with n handicap stones, Black’s rating is updated as if he had played an even game against a player who is weaker than White by about (n-0.5) stones, and White’s rating is updated as if he had played an even game against a player who is stronger than Black by about (n-0.5) stones.

shinuito · July 10, 2025, 12:00am

Not sure it helps, but I think some of the python code has been changed since the rating calculator conversion.

The win probably wasn’t shown anyway, but I remember that being one thing that changed.

It looks also like the handicap adjustment is different in the python code to the javascript one:

python:

github.com/online-go/goratings

analysis/util/RatingMath.py

6cab3090a


      
              # Convert the head start from "points" to "ranks", defining 1 rank as
              # the territorial value of a free move on a 19x19 board.  For small
              # boards, the head start needs to be scaled up to a 19x19 board.
              if size == 9:
                  return black_head_start * 6 / stone_value_territory
              if size == 13:
                  return black_head_start * 3 / stone_value_territory
              return black_head_start / stone_value_territory
          
          
          def get_handicap_adjustment(player: str, rating: float, handicap: int, size: int, komi: float, rules: str) -> float:
              rank_difference = get_handicap_rank_difference(handicap, size, komi, rules)
          
              # Apply the +/- for white/black in the "rank" domain where it's symmetric.
              # Note that the "rating" domain is log-scale, where +/- is asymmetric.
              assert player == "white" or player == "black"
              if player == "black":
                  effective_rank = rating_to_rank(rating) + rank_difference
              else:
                  effective_rank = rating_to_rank(rating) - rank_difference

doing some funny thing based on komi? Something like trying to unify 19x19 with smaller boards.

Compared to the js stuff that doesn’t do that I don’t think.

github.com/online-go/online-go.com

src/lib/rank_utils.ts

e8ec958d2


      
          }
          
          /** Calculates OGS rank deviation from the Glicko2 rating and deviation */
          export function rank_deviation(rating: number, deviation: number): number {
              // Suggestion: use the uncertainty propagation formula for log transforms:
              // https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulae
              //     - bpj
              return rating_to_rank(rating + deviation) - rating_to_rank(rating);
          }
          
          export function get_handicap_adjustment(rating: number, handicap: number): number {
              return rank_to_rating(rating_to_rank(rating) + handicap) - rating;
          }
          function overall_rank(user_or_rank: UserOrRank | rest_api.FullPlayerDetail["user"]): number {
              let rank: number;
              if (typeof user_or_rank === "number") {
                  rank = user_or_rank;
              } else {
                  rank = getUserRating(user_or_rank, "overall", 0).rank;
              }
              return rank;

jlt · July 10, 2025, 5:08am

The python looks coorect, it says that a stone is worth 6 ranks on 9x9 and 3 ranks on 13x13, so the handicap adjustment for a game with 0 komi and H handicap stones is equal to

H-0.5 on 19x19
6×(H-0.5) on 9x9
3×(H-0.5) on 13x13.

On the other hand, the js just says that the handicap adjustment is equal to H, if I understand correctly.

shinuito · July 10, 2025, 10:02am

Yeah I think back when the rating calculator was made that code was simpler (older version below)

github.com/online-go/goratings

analysis/util/RatingMath.py

79441c080


      
          def set_exhaustive_log_parameters(a: float, c:float, d:float, p:float = 1.0) -> None:
              global A
              global C
              global D
              global P
              A = a
              C = c
              D = d
              P = p
          
          def get_handicap_adjustment(rating: float, handicap: int) -> float:
              global HALF_STONE_HANDICAP
              global HALF_STONE_HANDICAP_FOR_ALL_RANKS
              if HALF_STONE_HANDICAP_FOR_ALL_RANKS:
                  return rank_to_rating(rating_to_rank(rating) + (handicap - 0.5 if handicap > 0 else 0)) - rating
              if HALF_STONE_HANDICAP:
                  return rank_to_rating(rating_to_rank(rating) + (0.5 if handicap == 1 else handicap)) - rating
              return rank_to_rating(rating_to_rank(rating) + handicap) - rating
          
          def set_optimizer_rating_points(points: List[float]) -> None:
              global optimizer_rating_control_points

There was some parameters to allow for the 0.5 shift, but they were set to false at the top of the file.

The game results seemed to match what the calculator was showing.

So I suppose we would be using the new code calculation where it’s ~H-0.5 as you said if it was committed a year ago, but I’m not sure that was retroactive.

I guess we can check against new handicap game results to see that the ratings differences are off by some amount. Here’s some rated games I played a few months ago, like April/May.

ended	game_id	played_black	handicap	rating	deviation	volatility	opponent_id	opponent_rating	opponent_deviation	outcome	extra	result
1746195727	74428177	0	9	1811.38	63.42	0.059992	62763	1071.42	61.73	1	null	Resignation
1744982098	73721797	0	9	1806.42	63.28	0.059994	62763	1089.91	61.97	0	null	15.5 points
1744646613	73570091	1	0	1823.89	63.14	0.059989	994421	1591.51	61.15	1	null	Resignation
1743508764	73883847	0	0	1819.01	62.98	0.059990	1743755	1871.41	61.27	0	null	Resignation
1741640412	73173233	1	0	1828.87	63.13	0.059991	568838	1654.70	61.78	1	null	Resignation

It looked like it worked ok for the non-handicap games. Occasionally it’s off by 1 decimal place, like 1e-2, but I’m not sure if that’s rounding or something else. I tried looking up the oppoents volatility also in the following examples.

An example with going from the third to second row, so for the 9 stone handicap where I lost as white.

I enter with the rank from row three (I believe) rating 1823.89 deviation 63.14, against the opponent in row 2, rating 1089.91 deviation 61.97, then after a loss it should update to rating 1806.42 and deviation 63.28 which is in row 2.

but the calculator says 1806.43 and deviation 63.27.

From row 2 to row 1, I enter at 1806.42 and deviation 63.28, I play against 1071.42 with deviation 61.73, white wins and it should go to 1811.38 and deviation 63.42

though instead it goes to 1811.38 and deviation 63.43 in the calculator.

So it could be that the 0.5 stone shift is changing the rating and deviaition by less than 1e-2, and that’s affecting some rounding. Or it could be that maybe the 0.5 rank shift hasn’t been implemented yet, though it’s been in the code base on github for a while.

I’m a bit tired today, so feel free to point out if I’m saying something silly.

jlt · July 10, 2025, 12:37pm

A 0.5 rank shift would affect the rating by about 1 rating point (you can try it with the calculator, add or subtract 0.5 rank to your opponent). So your tests show that

The rating calculator corresponds to what is implemented on the server;
The 0.5 rank shift hasn’t been implemented. The server considers that winning as White with H1 against a 5.0k is equivalent to winning an even game against a 4.0k.
Ignore differences of about 1e-2, these are rounding errors.

shinuito · July 10, 2025, 2:01pm

Cool, thanks for looking over it.

I just grabbed another game that finished a few mins ago, screenshot their predictions while the game was ongoing, vs after the game finished, just to have a more recent example.

This is a 2 stone game and white won.

So yeah I guess we’re waiting on the server to implement changes to match the GitHub repo, and then I or whoever can update the frontend to match that also.

jlt:

E = 1 / (1 + exp(-g_phi_j * (player.mu - p.mu)))
This corresponds to the formula

I suppose this is a winning probability, and I can’t find where the handicap has been taken into account.

I meant to come back to this also. I did find previously a mention of probability and expected outcome in glicko

It looks functionally the same but it’s evaluated at different parameters, like combined variance or standard deviation of the two players rather than just one.

Now how much that changes in moving from glicko to glicko2 I’m not sure. The rating scale is a little different I think, so that might modify the means slightly by a scale factor. The main change might be whether the volatility parameter makes a difference or is incorporated into the win probability or whether it’s only relevant in the update steps or the parameters.

I meant to make time to learn more about it, try to understand how it was derived but I hadn’t gotten around to it.