Yet another ratings thread

By bringing the ratings closer but leaving a rank gap of 0.46. This is mentioned up the thread as the gap in rank 1 handicap adjusts to.

Thinking about it I may have misinterpreted that and the rank gap should actually be 0.08. This makes the ELO adjustments of 11.492 closer to 12.5. It doesn’t make much difference overall however. The result of a win turning to a loss is way more significant than a few more rating points per win.

Edit: Those numbers are nonsense. They should be 10k→9.77k and 9k→9.23k respectively (The rest of the calculations are basically correct for that however).

I also updated them for the rank adjustments of 10k→9.54k and 9k→9.46k (leaving a rank gap of 0.08k at 1 handicap). The rate that White can allow wins to turn into losses for the lack of komi is then 0.077 (so the just under 4% turns into just under 8% close to 1 in 13 games).

Starting out with players 1.38k ranks apart results in a 1 handicap game with a 0.46k rank adjusted gap again however. Calculating based on the lowest possible rank gap implying a stone handicap could be given is a conservative choice of the calculations.

Hm, here’s my understanding:
From white’s point of view (when calculating white’s rating change) black’s rank changes from 10k to 9.5k (komi is just half a stone).
Then you calculate what glicko2 rating that 9.5k corresponds to and use that to calculate white’s new rating.

Same for black (white is treated as a 9.5k, black stays at 10k).

That’s why I don’t understand your numbers, they don’t seem to fit this concept of mine.

I didn’t read the rating adjustment rules precisely enough, but if there is a rank gap of about 0.46 then then the numbers will be about what I estimated. It doesn’t make any difference if it’s modelled as 10k vs 9.54k of 9.77k vs 9.23k. A rank gap of 0.54 is close to this as well.

Though it’s obviously slightly annoying that I posted a bunch of calculation mistakes already.

Are you saying the game is rated as if a 9.54k was playing against a 9.46k (in an even game)?

I’m not sure on the exact way to be honest, can maybe update the calculations and repost if you want to give me an exact description of how ranks are adjusted.

But yes, rating 1 handicap as if it’s an even game with ranks bought from 1k up to 0.08k close to together seems one way to do it. That results in the most conservative value for the amount of times white can afford for handicap rules to reverse the result (1 in 13) and the handicap to be fairly compensated. If the ranks are brought 0.46k or 0.5k close together the rate goes down.

Of course, players with any rank difference can play a 1 handicap game with either colour.

I appreciate you taking the time to do this. Yes there are some issues with the math. You are both half right.

JonKo’s explanation is correct in the point of view aspect, the rank adjustment is done as he describes, not split over both players.

However Komi doesn’t bring 9k to 9.5k, it brings it to a specific rank, decided by the formula I posted earlier. It differs under Japanese and Chinese rules, area counting and territory counting, etc. It’s not blanket +0.5k

SouthernGoPlayer uses that formula correct after editing. But the splitting over players is not the same outcome as how it should be, precisely because the rank-rating transfer is on a log scale! :slight_smile: ~The impact is tiny however.~

Edit: I now read you brought the rank delta from 1k to 0.08k, that means you applied the adjustment twice. That’s definitely wrong.
IF you apply the effective rating to both players at once, it’s should be half (ie ~0.25 from both players, leaving a gap of ~0.5, precise numbers depending on the ruleset). That will still cause a small <1% deviation from doing the rating correctly due to the log scale, but close enough to draw conclusions from.

I’ll probably not have time to look deeper at it tomorrow but just posting to say I’ll come back to it.

Yes, I was just thinking that @SouthernGoPlayer has put so much effort into this thread they should be granted a 0.5 rank boost or thereabouts. If this is not possible, then I bestow +3 luck for their next 5 games.

1 Like

I’ve given handicap and rating some thought back in 2020 and took a look at that code again today. Here’s what seems mostly correct to me:

2020 code

Old analysis/utils/RatingMath.py

def get_handicap_adjustment(rating: float, handicap: int) -> float:
    return rank_to_rating(rating_to_rank(rating) + (handicap - 0.5 if handicap > 0 else 0)) - rating

Old analysis/analyze_glicko_one_game_at_a_time.py

updated_black = glicko2_update(
            black,
            [
                (
                    white.copy(-get_handicap_adjustment(white.rating, game.handicap)),
                    game.winner_id == game.black_id,
                )
            ],
        )

updated_white = glicko2_update(
            white,
            [
                (
                    black.copy(get_handicap_adjustment(black.rating, game.handicap)),
                    game.winner_id == game.white_id,
                )
            ],
        )

Old goratings/math/glicko2.py

    def copy(self, rating_adjustment: float = 0.0, rd_adjustment: float = 0.0) -> "Glicko2Entry":
        ret = Glicko2Entry(self.rating + rating_adjustment, self.deviation + rd_adjustment, self.volatility,)
        return ret
def glicko2_update(player: Glicko2Entry, matches: List[Tuple[Glicko2Entry, int]]) -> Glicko2Entry:
    pass # just for the signature, player should be opponent

Now that is code that was used for testing and I just realized it has changed, but I’ll have to look at the new code more thoroughly tomorrow.

Sounds good. I certainly simplified the calculations quite a bit by doing things like linearly interpolating the rank to rating conversions.

Ultimately this doesn’t make much difference however, the point just hangs on the rating delta values, how much White gains for a win in 1 handicap, how much less White gains for a win in even and how much White loses for a loss in 1 handicap. The difference between how much White needs to win to make up for an even win turning into a 1 handicap loss can then be calculated. It’s also minimized when handicap calculations bring the ratings/rankings closer together.

Ah, yes, I’ve read the updated code today and I agree, it’s just approximately 0.5 ranks, not exactly 0.5. (I’m not yet convinced the /12 is correct for all rulesets though, but that’s not that important.)

get_handicap_rank_difference

Link to get_handicap_rank_difference

def get_handicap_rank_difference(handicap: int, size: int, komi: float, rules: str) -> float:
    # Number of extra moves black makes before white responds.
    num_extra_moves = handicap - 1 if handicap > 1 else 0

    if rules == "japanese" or rules == "korean":
        # Territory scoring.
        area_bonus = 0
        handicap_scoring_bonus = 0
    else:
        # Bonus for the area value of a stone in area scoring.
        area_bonus = 1

        # Chinese and AGA rules add a handicap bonus for white in addition
        # to the komi.
        if rules == "chinese":
            handicap_scoring_bonus = 1 * handicap
        elif rules == "aga":
            handicap_scoring_bonus = 1 * num_extra_moves
        else:
            handicap_scoring_bonus = 0

    # Full points added to white's score, including any handicap scoring.
    full_komi = komi + handicap_scoring_bonus

    # Current best estimate for perfect komi.
    #
    # Sources:
    # - <https://en.wikipedia.org/wiki/Komi_(Go)#Perfect_Komi>
    # - <https://senseis.xmp.net/?Komi#toc8>
    perfect_komi_territory = 6
    perfect_komi = perfect_komi_territory + area_bonus

    # Komi compensates white for black getting an extra half move.  The
    # territorial value of a free stone is twice that.
    stone_value_territory = (perfect_komi_territory) * 2
    stone_value = stone_value_territory + area_bonus

    # The point value of black's advantage (or disadvantage) at the start
    # of the game.  This value is normalized to have the same meaning
    # whether using area or territory rules, using the logic that the AGA
    # ruleset uses to make territory counting equivalent to area counting.
    black_head_start = perfect_komi - full_komi + stone_value * num_extra_moves

    # Convert the head start from "points" to "ranks", defining 1 rank as
    # the territorial value of a free move on a 19x19 board.  For small
    # boards, the head start needs to be scaled up to a 19x19 board.
    if size == 9:
        return black_head_start * 6 / stone_value_territory
    if size == 13:
        return black_head_start * 3 / stone_value_territory
    return black_head_start / stone_value_territory
get_handicap_adjustment

Link to get_handicap_adjustment

def get_handicap_adjustment(player: str, rating: float, handicap: int, size: int, komi: float, rules: str) -> float:
    rank_difference = get_handicap_rank_difference(handicap, size, komi, rules)

    # Apply the +/- for white/black in the "rank" domain where it's symmetric.
    # Note that the "rating" domain is log-scale, where +/- is asymmetric.
    assert player == "white" or player == "black"
    if player == "black":
        effective_rank = rating_to_rank(rating) + rank_difference
    else:
        effective_rank = rating_to_rank(rating) - rank_difference

    return rank_to_rating(effective_rank) - rating
process_game

Link to process_game

def process_game(self, game: GameRecord) -> Glicko2Analytics:
    if game.black_manual_rank_update is not None:
        self._storage.set(game.black_id, Glicko2Entry(rank_to_rating(game.black_manual_rank_update)))

    if game.white_manual_rank_update is not None:
        self._storage.set(game.white_id, Glicko2Entry(rank_to_rating(game.white_manual_rank_update)))

    if should_skip_game(game, self._storage):
        return Glicko2Analytics(skipped=True, game=game)

    black = self._storage.get(game.black_id)
    white = self._storage.get(game.white_id)

    updated_black = glicko2_update(
        black,
        [
            (
                white.copy(get_handicap_adjustment("white", white.rating, game.handicap,
                                                    komi=game.komi, size=game.size,
                                                    rules=game.rules,
                        )),
                game.winner_id == game.black_id,
            )
        ],
        timestamp=game.ended,
    )

    updated_white = glicko2_update(
        white,
        [
            (
                black.copy(get_handicap_adjustment("black", black.rating, game.handicap,
                                                   komi=game.komi, size=game.size,
                                                   rules=game.rules,
                        )),
                game.winner_id == game.white_id,
            )
        ],
        timestamp=game.ended,
    )

    self._storage.set(game.black_id, updated_black)
    self._storage.set(game.white_id, updated_white)
    #self._storage.add_rating_history(game.black_id, game.ended, updated_black)
    #self._storage.add_rating_history(game.white_id, game.ended, updated_white)

    return Glicko2Analytics(
        skipped=False,
        game=game,
        expected_win_rate=black.expected_win_probability(
            white, get_handicap_adjustment("black", black.rating, game.handicap,
                                           komi=game.komi, size=game.size,
                                           rules=game.rules,
                ), ignore_g=True
        ),
        black_rating=black.rating,
        white_rating=white.rating,
        black_deviation=black.deviation,
        white_deviation=white.deviation,
        black_rank=rating_to_rank(black.rating),
        white_rank=rating_to_rank(white.rating),
        black_updated_rating=updated_black.rating,
        white_updated_rating=updated_white.rating,
    )
1 Like

Ok here is my response. I write as I go.

In the context of handicaps this is what we’re going to research, so I won’t comment on this right now :slight_smile:

These calculations are off.

  • R_rating = 525 *e^(r_rank_index/23.15)
  • r_rank_index ranges from 0 at 30kyu, to 38 for 9dan. For a 10k this is index 20, 9k is index 21.

Ratings are then:

  • 10k player: 525 *e^(20/23.15) = 1246 (not 1612)
  • 9k player: 525 *e^(21/23.15) = 1301 (not 1664)
  • The rating gap is 55 (not 52)

This does end up being close enough that the rest of the post is okay.

Assuming the gap of 52, these calculations are correct.

Formula

ELO win probabilty P_player = 1 / ( 1 + 10^((Rating_opponent - Rating_player)/400))

  • P_white = 1 / ( 1 + 10^((1612 - 1664)/400)) = 0.57428 = 57.428%
  • P_black = 1 / ( 1 + 10^((1664 - 1612)/400)) = 0.42572 = 42.572%

Point change is also correct:

  • Black win: 25 * (1 - 0.42572) = +14.357
  • Black loss: 25 * 0.42572 = -10.643

Just for completeness, the corrected ELO win probabilities for gap 55 are:

  • white 9k = 57.850%
  • black 10k = 42.150%

And point change:

  • Black win: 14.463
  • Black loss: -10.538

This part has been already discussed in the replies so I’ll just summarise and repeat.

The effective rank adjustment is approximately 0.5, exact adjustment depends on the ruleset (see JonKo’s post of formula code). Perhaps for the general case it’s best to use 0.5, for "japanese 19x19 handicap 1 (komi 0.5 & no extra stones) it is 0.46.

The updating works as follows:

  • Black update: Black 10k vs White treated as 9.46k
  • White update: White 9k vs Black treated as 9.54k

This is different than giving both players half of the rank difference, because the log relation between rank-rating gives that different amount of points change.

I agree with the idea that the games that flip is related to the rate of handicap 1 changing the outcome.

However, the calculations are off, even when using your values based on ELO and a rank gap of 52

Even game:

  • B,B,W,W,W,B,W,B,W,W,B,W,B,B,W = 7 black wins, 8 white.
  • Point change is: 7 * 14.357 - 8 * 10.643 = 15.355 black gains / white loses (not 7.927)

Handicap 1 game;

  • B,B,W,B,W,B,W,B,W,W,B,B,B,B,W. = 9 black wins, 6 white
  • Point change is: 9 * 13.508 - 6 * 11.492 = 52.620 black gains / white loses (not 23.589)

Then second, a bit nitpicky, if you put the distribution mean at 6.5, you are implying black has a 50% winchance in an even game. We know / assume in this system that black has a win probabilty of 42% under ELO, so the mean should be lower to correspond with that. For illustration purposes it’s fine.

The wrong value is used for “black wins handicap 1 game”. You calculated this earlier as being 13.508, so “10.643 + 11.492 = 22.135 points” should be “10.643 + 13.508 = 24.151 points”.

This is the swing when a white wins even game becomes a black-wins-handicap-game.

If you include the lost 0.849 the full swing is 24.151+0.849=25

The final ratio then becomes 0.849 / 25 = 3.4%

The premise seems generally correct to me:

  • Handicap 1 (0.5 komi) will flip results. This is exactly what handicap is supposed to do.
  • This can be expressed as a break even rate of flipped games / total games.

The expression of this break-even ratio of 3.4% is just a mathematical representation of the calculated win probabilities and rating changes.

I am genuinely puzzled: what’s wrong about this? Where’s the error?

When I asked you to post the handicap calculations, it was after you claimed that Glicko2 can’t properly handle handicap. Because I didn’t understand that, I asked you for the calculations and to point out which part specifically is wrong.

This post does contain calculations of win probability and rating changes, but unless I missed it, you don’t actually claim that something’s wrong. In the post you land on a break-even flip rate. If you are implying “the break-even ratio of about 3.4% is wrong”, then my question is: wrong based on what?

2 Likes

I think I found all those and updated my calculations to incorporate exact rank conversion. Was relying on a table for that but it appears the table was out of date as the ratings were higher for the same rank. I also updated the compensation formula which has a small effect because W has a small advantage expected (about 0.0417 in rank expected) in the rating formula with fair komi which equates to an expected white win rate of 53.5% in even games of even rank. The more significant mistake I found is in handicap White also benefits from rating points due to losses being reduced (as well as rating points due to wins being increased) so this should go into the break even rate of flipped games / total games. The correct break-even rate is ((White win rating delta at 1 handicap) - (White win rating delta at even) + (White loss rating delta at 1 handicap - White loss rating delta at even)) / ((White win rating delta at even) - (White loss rating delta at 1 handicap))

My point about that is that unless this rate matches roughly the rate at which games get flipped then the handicap rating is penalizing the rank of the handicap giving player. Basically, over a series of such games (which at 1 handicap can involve exactly the same moves being played) Whites rating is adjusted lower than for an even match and Black’s is adjusted higher than for an even match. This almost certainly happens at a high enough rank as the typical decisive scoring margin reduces higher up the ranks. It’s also relevant that an exactly 9k player can be giving 1 handicap to players in the whole rank range 10k down to 11k. So even if you targeted a particular decisive scoring margin at a rank to match game data its higher at 9k vs 10k than 9k vs 11k. But my point is that there must inherently be some amount of unfair adjustment going on within this system.

My updated summary numbers are,

Ranks Break Even Rate
10k vs 9k 7.99%
11k vs 9k 7.18%
8k vs 7k 8.72%
9k vs 7k 7.73%
6k vs 5k 9.51%
7k vs 5k 8.31%
4k vs 3k 10.36%
5k vs 3k 8.91%
2k vs 1k 11.29%
3k vs 1k 9.52%
1d vs 2d 12.29%
1k vs 2d 10.12%
3d vs 4d 13.38%
2d vs 4d 10.71%
5d vs 6d 14.54%
4d vs 6d 11.27%
7d vs 8d 15.78%
6d vs 8d 11.78%

To me these ranks (about 1 in 12 at top to 1 in 6 at bottom) seem low rates for games to be reversed by handicap in practice.

I’ve done some statistics about black/white win rate for pros in the past, and they are not even across different players, they have vastly different win/loss ratio playing as white or black, just due to their styles and strength/weakness, same can be said for handicaps, as historical data in the past showed different players play as white better than others.

So even if on average this would be the ratio, we should see in the trend where some players don’t get affected much or even benefit from this, but some suffer more than this, right?

1 Like

The ratio being talked about is the rate of wins by White with a score less than Komi, so that the result would be reversed with 1 handicap. I think even a majority of wins at pro level are by less than Komi.

1 Like

No, now you double count. A game is either white wins or white loses, not both at the same time.

Edit: I don’t know if you also meant that it is a mistake that white “benefits” from losing less points when losing against a handicap. That is exactly what ELO and glicko are supposed to do. If white plays black with handicap, then the game is more difficult than against black without handicap. So white should lose less points on losing the game.

Good, we agree on the actual question. That is what several people here have asked from the beginning: are you claiming that the model’s predicted flip rate does not match reality?

Scored games are a subset of all games played. The scored outcomes are biased towards games that are so tight that players play them out to the end. If you look up the game history of 3d-6d players you’ll see most of their games don’t reach scoring at all.

Margin of victory is also not an indication of how close a game was. Once a good player is in the lead, they’ll play safer and keep that lead, playing the game out till the end. If they play a handicap game, they wouldn’t play those (now losing) losing lines and instead use their strength to win with the required number of points.

Using game margins to predict wins and losses under different handicaps is therefore quite risky, inaccurate and only indicative at the amateur level where people don’t count (accurately).

Even if Handicap 1 covers a range from 10.0k to 10.9k, the rating calculation still uses the actual ratings, so expected winrate is not identical across that range.

Without double counting, the rates are actually lower. But how high should the rates be?

1 Like

I don’t agree that is double counting. White benefits in two ways (rating adjustment wise) from the handicap adjustments, both getting more for their wins and losing less for their losses.

When only counting one side those rates become 1 in 24 at 10k which is definitely too low.

I still think the actual 1 in 12 to 1 in 14 rates (for a 10k to 11k vs 9k game) are likely too low too though there is a lot less handicap play as ranks increase anyway.

Do the game ended with scoring matter or possible ending of score matter? Majority of the pro games ended in resign, and usually from I heard from pros are they only resign when they think the difference is like 10, hence no point in trying, hence by that logic majority of the pro games (roughly 85% of them) resigned and are higher than the difference of komi.

And from the records when games when into scoring, they most fall between 3.5 to 5.5 (those within 0.5 to 2.5 are fewer), hence we can fairly certain of the games resigned, likely most if ever when into scoring will be beyond the difference of komi (which makes sense, since most resigned games involved intense fighting, a big group died).

Also, it is not just about pro games, it is just they are easily access and do statistic, players of all strength show difference in their black/white win rate due to their style. So naturally, they should fall into different bracket, rated with the same formula.

White benefits per game. It does so indeed at wins, and at losses. This is the same number.

Fill in the numbers for win rates and you’ll see that you just added the same number twice. This number is the value that white gets per game no matter if it wins or loses.

  • White win rating delta 1 - white win rating delta even

Is exactly the same as

  • White loss ratings delta 1 - white win rating delta even.

I think this can be simplified much further.

The “break-even flip rate” is just the difference between the predicted Black winrates in the two game conditions.

Using your own numbers:

  • P_black_even = 42.572%

  • P_black_H1 = 45.968%

So:

  • break_even_flip_rate = 45.968% - 42.572%
    = 3.396%

That’s all it is.

The longer rating-point calculation just re-derives the same thing, because Elo/Glicko updates are already built around expected winrates.

So doubling it is wrong because then it doesn’t match this much simpler calculation.

It‘s not the games which go to counting, that’s just a way to estimate what the margin of victory is across all games. At least the higher level pro’s will resign when losing by a couple of points so it’s unclear from game play what the typical margin of victory is. If it needed to be estimated you probably apply an AI to the final position if it wasn’t counted.

The change in win rates for 1 handicap effects the rating changes for both wins and losses though. Noticing that it’s closely related to the win rate is quite handy (Though doubling the difference in win rates doesn’t give the exact same value, it comes out slightly less).