Handicap and the ranking system

Jon_Ko · April 3, 2024, 11:46am

I think the whole rank system (not rating system) can only be sensible if there are enough ranked handicap games played on the server.

GreenAsJade · April 3, 2024, 8:43pm

I’ve wondered about that myself. It’s clear that the definition of “rank” in the ranking system is in terms of “handicap stones” - one rank difference is supposed to equal one handicap stone - so how can that possibly be trie if enough handicap is not played?

I suspect that the compensating factor is calibration against other ranking systems.

But as you say, this is not related to whether the rating system is working to produce a measure of skill that results in even games…

gennan · April 3, 2024, 9:26pm

I suppose the question could also be rephrased as how many handicap games are needed to confirm the shape of the Elo per rank curve with sufficient confidence over a sufficient range.

The EGD only has some 100,000 handicap games in it, some 10% of all the games. That wasn’t really enough to determine the shape of the curve. But the EGF uses declared ranks, so even game results already gave enough data to determine the shape of the curve and the handicap game data was found to be at least consistent with that curve.

Even though relatively few OGS games are played with (larger) handicaps, I assume there are still many more than 100,000 handicap games in OGS’s historical data. If there are like a million, I expect it to be enough.

The numbers below are from a period of like 2 months 3 years ago, where some 17% of all games were with handicap, although mostly with small handicap:

handicap | count  |        %        
----------+--------+----------------------------
        0 | 432969 |    82.63
        1 |  68731 |    13.12
        2 |  13528 |     2.58
        3 |   2990 |     0.57
        4 |   1969 |     0.38
        5 |   1250 |     0.25
        6 |    917 |     0.18
        7 |    605 |     0.12
        8 |    392 |     0.07
        9 |    659 |     0.13

[2021 Rating and rank adjustments - #382 by anoek]

And 4 years ago, anoek posted OGS handicap game statistics for handicap up to 4 stones on different board sizes, where handicap games also seemed to make up some 20-30% of all games on OGS.
See the tables in item 2 of 2020 Rating and rank tweaks and analysis which I copied below:

9x9 ranks predicting 9x9 games

            Handicap 0               Handicap 1             Handicap 2             Handicap 3

    30k  47.8 : 49.6 [n=2579]    33.3 : 49.8 [n=39]     28.0 : 48.5 [n=25]      40.7 : 48.9 [n=27]  
    25k  48.4 : 49.7 [n=20436]   44.7 : 48.7 [n=4417]   54.7 : 49.6 [n=311]     65.4 : 50.1 [n=104] 
    20k  47.7 : 49.6 [n=60398]   45.0 : 48.9 [n=32404]  51.5 : 49.5 [n=555]     63.6 : 49.6 [n=176] 
    15k  47.4 : 49.4 [n=207911]  44.7 : 48.8 [n=70964]  54.0 : 49.3 [n=2123]    63.6 : 49.7 [n=558] 
    10k  48.9 : 49.4 [n=333649]  45.6 : 49.1 [n=44339]  55.5 : 49.6 [n=1802]    71.4 : 50.3 [n=245] 
    5k   49.3 : 50.2 [n=83326]   44.2 : 49.9 [n=2287]   61.4 : 50.0 [n=451]     75.0 : 51.4 [n=96]  
    1d   52.8 : 50.9 [n=4699]    58.3 : 50.1 [n=24]     100.0 : 52.2 [n=3]      --  
    6d   46.7 : 54.1 [n=15]      --                     --                      --  
    ALL 48.4 : 49.5 [n=713013]   45.0 : 48.9 [n=154474]  54.8 : 49.5 [n=5270]    65.8 : 50.0 [n=1206]    

13x13 ranks predicting 13x13 games
    30k 44.6 : 49.6 [n=121]     47.5 : 49.9 [n=219]     100.0 : 48.3 [n=1]      50.0 : 47.1 [n=2]   
    25k 48.0 : 49.5 [n=2773]    45.5 : 49.7 [n=1756]    52.9 : 49.5 [n=104]     32.6 : 48.9 [n=43]  
    20k 44.0 : 49.3 [n=14530]   45.4 : 49.8 [n=11125]   44.4 : 49.4 [n=941]     42.7 : 49.4 [n=293] 
    15k 45.8 : 49.3 [n=60647]   46.5 : 50.0 [n=16939]   49.7 : 48.8 [n=2864]    39.7 : 49.1 [n=648] 
    10k 47.3 : 49.4 [n=85200]   47.8 : 50.3 [n=12933]   59.7 : 49.4 [n=2635]    48.0 : 49.9 [n=306] 
    5k  49.3 : 50.3 [n=17200]   45.4 : 50.0 [n=866]     58.3 : 49.8 [n=357]     57.6 : 49.9 [n=33]  
    1d  46.2 : 51.0 [n=249]     --                      --                      --  
    6d  66.7 : 60.7 [n=3]       --                      --                      --  
    ALL 46.7 : 49.4 [n=180723]  46.5 : 50.0 [n=43838]   53.3 : 49.2 [n=6902]    42.5 : 49.3 [n=1325]    


19x19 ranks predicting 19x19 games
    30k 48.4 : 49.6 [n=1504]    42.9 : 49.1 [n=212]     39.2 : 49.4 [n=227]     34.3 : 49.7 [n=67]  
    25k 45.9 : 49.5 [n=18333]   40.9 : 49.6 [n=4517]    40.0 : 49.7 [n=4083]    41.2 : 49.8 [n=857] 
    20k 47.3 : 49.5 [n=98224]   42.4 : 49.2 [n=20335]   41.3 : 49.2 [n=14927]   42.2 : 49.3 [n=4369]    
    15k 49.0 : 49.4 [n=235420]  43.5 : 48.7 [n=40980]   43.3 : 48.6 [n=26238]   45.5 : 48.9 [n=10026]   
    10k 49.6 : 49.3 [n=344796]  45.0 : 48.6 [n=45871]   46.3 : 48.5 [n=23862]   47.3 : 48.9 [n=9851]    
    5k  49.7 : 49.5 [n=245545]  44.8 : 48.7 [n=17596]   44.7 : 48.4 [n=8277]    41.8 : 48.4 [n=3881]    
    1d  49.2 : 49.8 [n=33502]   42.6 : 49.5 [n=2086]    47.5 : 50.0 [n=1100]    47.5 : 50.0 [n=589] 
    6d  48.9 : 50.6 [n=1585]    54.1 : 49.4 [n=181]     70.0 : 50.3 [n=50]      76.9 : 53.3 [n=26]  
    ALL 49.2 : 49.4 [n=978909]  43.9 : 48.8 [n=131778]  43.9 : 48.8 [n=78764]   45.0 : 48.9 [n=29666]   



Combined rating system


Predicting 9
    30k 49.2 : 49.8 [n=17573]   47.8 : 48.6 [n=2060]    49.2 : 49.9 [n=240]     68.0 : 49.9 [n=125] 
    25k 48.6 : 49.7 [n=43982]   45.5 : 48.7 [n=22370]   56.2 : 49.8 [n=299]     63.0 : 50.5 [n=100] 
    20k 48.1 : 49.5 [n=117350]  45.1 : 48.4 [n=56551]   53.7 : 49.3 [n=779]     70.5 : 49.6 [n=325] 
    15k 49.0 : 49.5 [n=262529]  45.4 : 48.1 [n=53642]   57.7 : 49.5 [n=1413]    72.6 : 49.8 [n=332] 
    10k 50.6 : 49.7 [n=206142]  47.9 : 48.6 [n=14727]   57.1 : 49.7 [n=1120]    71.7 : 50.8 [n=237] 
    5k  49.9 : 50.2 [n=49161]   43.4 : 49.0 [n=648]     60.5 : 50.6 [n=124]     75.0 : 49.2 [n=12]  
    1d  54.9 : 50.6 [n=4802]    52.1 : 47.7 [n=73]      100.0 : 51.5 [n=9]      87.5 : 53.4 [n=8]   
    6d  59.6 : 51.8 [n=171]     --                      --                      --  
    ALL 49.4 : 49.6 [n=701710]  45.6 : 48.4 [n=150071]  56.3 : 49.6 [n=3984]    70.6 : 50.1 [n=1139]    

Predicting 13
    30k 49.3 : 49.6 [n=2094]    49.5 : 49.8 [n=2393]    65.6 : 49.0 [n=32]      48.6 : 49.9 [n=37]  
    25k 47.5 : 49.5 [n=8051]    46.9 : 49.5 [n=8310]    41.6 : 48.8 [n=308]     33.3 : 49.1 [n=33]  
    20k 46.5 : 49.2 [n=24573]   46.3 : 49.6 [n=20826]   49.8 : 48.6 [n=1222]    46.2 : 49.6 [n=279] 
    15k 47.6 : 49.2 [n=78278]   47.3 : 49.5 [n=24001]   50.5 : 48.2 [n=2349]    41.1 : 49.5 [n=445] 
    10k 49.0 : 49.4 [n=86805]   47.9 : 50.0 [n=13165]   57.8 : 48.3 [n=1500]    48.4 : 50.0 [n=182] 
    5k  50.5 : 50.2 [n=19668]   45.6 : 49.9 [n=1041]    56.2 : 48.9 [n=169]     46.7 : 47.2 [n=15]  
    1d  54.5 : 50.7 [n=985]     60.7 : 51.2 [n=28]      100.0 : 45.8 [n=4]      --  
    6d  40.5 : 51.8 [n=42]      --                      --                      --  
    ALL 48.3 : 49.4 [n=220496]  47.1 : 49.6 [n=69764]   52.1 : 48.4 [n=5584]    44.0 : 49.6 [n=991] 

Predicting 19
    30k 50.5 : 49.6 [n=2299]    44.3 : 48.9 [n=221]     36.9 : 48.5 [n=222]     31.7 : 49.5 [n=63]  
    25k 48.0 : 49.5 [n=10385]   42.3 : 48.9 [n=3479]    40.6 : 48.9 [n=3215]    38.3 : 49.2 [n=664] 
    20k 47.7 : 49.4 [n=61028]   41.1 : 48.7 [n=18301]   39.3 : 48.9 [n=15631]   39.9 : 49.0 [n=3557]    
    15k 48.7 : 49.3 [n=224916]  42.1 : 48.4 [n=48650]   41.1 : 48.6 [n=34843]   41.8 : 48.9 [n=12427]   
    10k 49.2 : 49.3 [n=410840]  43.1 : 48.3 [n=68206]   43.2 : 48.5 [n=40606]   44.0 : 48.9 [n=15422]   
    5k  49.6 : 49.4 [n=381050]  44.3 : 48.4 [n=32244]   44.5 : 48.3 [n=15511]   43.1 : 48.4 [n=6854]    
    1d  49.4 : 49.9 [n=60938]   41.3 : 49.0 [n=3580]    46.5 : 49.2 [n=2010]    44.3 : 49.0 [n=1061]    
    6d  49.0 : 50.6 [n=2519]    53.2 : 50.0 [n=220]     66.1 : 50.4 [n=59]      84.8 : 52.3 [n=33]  
    ALL 49.2 : 49.4 [n=1153975] 42.8 : 48.4 [n=174901]  42.2 : 48.6 [n=112097]  42.7 : 48.8 [n=40081]   



Combined prediction of all games
    30k 49.3 : 49.8 [n=21966]   48.5 : 49.2 [n=4674]    44.7 : 49.2 [n=494]     54.7 : 49.8 [n=225] 
    25k 48.3 : 49.6 [n=62418]   45.5 : 48.9 [n=34159]   41.9 : 48.9 [n=3822]    41.2 : 49.4 [n=797] 
    20k 47.8 : 49.4 [n=202951]  44.6 : 48.7 [n=95678]   40.7 : 48.9 [n=17632]   42.7 : 49.1 [n=4161]    
    15k 48.7 : 49.4 [n=565723]  44.5 : 48.5 [n=126293]  42.2 : 48.6 [n=38605]   42.6 : 48.9 [n=13204]   
    10k 49.6 : 49.4 [n=703787]  44.5 : 48.6 [n=96098]   44.1 : 48.5 [n=43226]   44.5 : 48.9 [n=15841]   
    5k  49.7 : 49.5 [n=449879]  44.3 : 48.5 [n=33933]   44.7 : 48.3 [n=15804]   43.1 : 48.4 [n=6881]    
    1d  49.8 : 49.9 [n=66725]   41.6 : 49.0 [n=3681]    46.8 : 49.2 [n=2023]    44.6 : 49.1 [n=1069]    
    6d  49.6 : 50.7 [n=2732]    53.2 : 50.0 [n=220]     66.1 : 50.4 [n=59]      84.8 : 52.3 [n=33]  
    ALL 49.2 : 49.5 [n=2076181] 44.6 : 48.6 [n=394736]  43.1 : 48.6 [n=121665]  43.5 : 48.9 [n=42211]

benjito · April 3, 2024, 11:55pm

There’s also the question of whether the traditional idea of rank is even self-consistent.

Like if Alice give four stones to Bob, and Bob gives four stones to Charlie… does Alice necessarily give eight stones to Charlie?

gennan · April 4, 2024, 12:04am

From what I have seen, the answer to that question is usually “yes” (with some margin of error and statistical noise).

I suppose it will break down when trying to compensate a very large rank gap by a very large handicap, but as long as the gaps and handicaps are not too large, one accounts for the half-rank bias of traditional handicap and ranks are assigned properly, calculated handicaps work pretty well to even out winning chances.

When you have a group of players (like a go club) that only play handicap games with each other, and adjust mutual handicap depending on game results, they will eventually settle on a fairly stable and consistent matrix of mutual handicaps. With that matrix, you can rank them relative to each other.

Something like that was how ranking worked before the advent of the internet, and I think many go clubs still do. Even some go servers may still work something like that.

Uberdude · April 4, 2024, 12:23am

One reason I’ve long advocated for handicaps on rather than off by default on OGS.

hoctaph · April 4, 2024, 1:04am

Bear in mind that even without handicap, one can ask whether rating is self-consistent:
There could be 3 players with a rock-paper-scissors relationship.

(A consistently beats B, B consistently beats C, C consistently beats A)

Even if there is such a thing, we still try to put the players
on a linear scale, with as much accuracy as that allows.

Jon_Ko · April 4, 2024, 1:21am

How can OGS match players of different strength so that both black and white have a decent chance of winning?

I think this would be an interesting question to explore.

Sadaharu · April 4, 2024, 6:22am

I prefer reverse komi to handicap stones. I think they make more sense and still allow for a “normal” game to be played

gennan · April 4, 2024, 7:49am

I think such situations are quite rare, but if they are completely balanced and they play with nobody else, I think they should have the same rating. What would be wrong with that?

When aiming for an expected winrate range (like 25-75%), you can search for players with ratings within some Elo distance. That won’t be too hard.
With Glicko2 you may need a slightly more complicated search with more parameters. I don’t know what those calculations should look like.

Reverse komi handicap has been discussed before. I think reverse komi games would just fit in the handicap system when you use 2 times perfect komi per full handicap stone.

Jon_Ko · April 4, 2024, 9:41am

That’s one answer to my question, but it imposes a limit on the difference in player strength and leads to questioning, whether such a limit is necessary.

richyfourtytwo · April 4, 2024, 12:59pm

Without handicap that should be pretty rare, I agree.

In our local club we had our own ranking system between 4 regulars, which worked fine. It was somewhat stable with some fluctuations and when we played small tournaments everyone had a chance to win. When one player left it collapsed though. It was still stable, but too much so, because we always had that A>B>C>A situation. The reason was we were 2 players about 5 ranks apart and the 3rd about 10 ranks lower. I was the middle one, but I was ‘better’ at trying unusual (read: nonsensical) stuff against the large handicap (on 9x9 or 13x13). So I alway won with white and always lost with black against the stronger player, who again, with his reasonable moves was helpless against the even larger handicap. In hindsight it is possibly ever more surprising that it worked well with 4 player, where number 4 was still about one stone weaker.

(Edit: many embarassing typos fixed. Enjoy the remaining ones.)

gennan · April 4, 2024, 5:53pm

I thought you were asking about finding opponents for an even game, where the expected winrate is not too far from 50%. So I proposed to look for opponents that are within some range of players strength. But now it seems that this is not what you want.
To me this seems like a paradox, so I suppose I misunderstood your first question.

Jon_Ko · April 4, 2024, 8:34pm

I’m not requiring even games (in the sense of no handicap). And I just wanted to give an example question for what this thread could focus on, because I think my initial post (which originally belonged to a different thread) doesn’t do a good job at eliciting a fruitful discussion.

gennan · April 4, 2024, 8:46pm

Currently OGS allows players to find an opponent within some rank range.
I suppose this could be modified to find an opponent within some expected winrate range, where handicap can be allowed as an option.
So when you select a 30-70% range, the system might come up with a 3 stone game against an opponent who is 5 ranks stronger.
Or you select a 80-90% range (because you like to win) and the system might come up with a game where you get 2 stones handicap against an opponent who is 2 ranks weaker.

Do you mean something like that?

Jon_Ko · April 4, 2024, 9:06pm

That’s an interesting idea, give the users the option to decide themselves what they think is “a decent chance of winning”.

Another option would be to say “only trying to get as close to a 50-50 chance of winning is decent”. And then – as an example – we’d maybe need a komi of 3.5 (if white is just slightly stronger than black).

slowthought · April 23, 2024, 9:47pm

I see no problem with negotiating an appropriate handicap for a game, but there must be reasonable limits. Surely, @gennan, you were being sarcastic?

Then again, I don’t have a standard of “appropriate” that is easily coded.

gennan · April 24, 2024, 5:27pm

Not really, but I suppose that if you select 80-90% range you’'ll probably have to wait a while for an opponent who accepts a game with them having 10-20% win expectation.

Uberdude · April 24, 2024, 5:33pm

OGS starting a subs and doms matchmaking service?

Jon_Ko · April 24, 2024, 6:04pm

So far it’s just @gennan’s interpretation of what I said. Neither he, nor I are advocating it, let’s keep that in mind.