I think the whole rank system (not rating system) can only be sensible if there are enough ranked handicap games played on the server.

Iāve wondered about that myself. Itās clear that the definition of ārankā in the ranking system is in terms of āhandicap stonesā - one rank difference is supposed to equal one handicap stone - so how can that possibly be trie if enough handicap is not played?

I suspect that the compensating factor is calibration against other ranking systems.

But as you say, this is not related to whether the rating system is working to produce a measure of skill that results in even gamesā¦

I suppose the question could also be rephrased as how many handicap games are needed to confirm the shape of the Elo per rank curve with sufficient confidence over a sufficient range.

The EGD only has some 100,000 handicap games in it, some 10% of all the games. That wasnāt really enough to determine the shape of the curve. But the EGF uses declared ranks, so even game results already gave enough data to determine the shape of the curve and the handicap game data was found to be at least consistent with that curve.

Even though relatively few OGS games are played with (larger) handicaps, I assume there are still many more than 100,000 handicap games in OGSās historical data. If there are like a million, I expect it to be enough.

The numbers below are from a period of like 2 months 3 years ago, where some 17% of all games were with handicap, although mostly with small handicap:

```
handicap | count | %
----------+--------+----------------------------
0 | 432969 | 82.63
1 | 68731 | 13.12
2 | 13528 | 2.58
3 | 2990 | 0.57
4 | 1969 | 0.38
5 | 1250 | 0.25
6 | 917 | 0.18
7 | 605 | 0.12
8 | 392 | 0.07
9 | 659 | 0.13
```

[2021 Rating and rank adjustments - #382 by anoek]

And 4 years ago, anoek posted OGS handicap game statistics for handicap up to 4 stones on different board sizes, where handicap games also seemed to make up some 20-30% of all games on OGS.

See the tables in item 2 of 2020 Rating and rank tweaks and analysis which I copied below:

```
9x9 ranks predicting 9x9 games
Handicap 0 Handicap 1 Handicap 2 Handicap 3
30k 47.8 : 49.6 [n=2579] 33.3 : 49.8 [n=39] 28.0 : 48.5 [n=25] 40.7 : 48.9 [n=27]
25k 48.4 : 49.7 [n=20436] 44.7 : 48.7 [n=4417] 54.7 : 49.6 [n=311] 65.4 : 50.1 [n=104]
20k 47.7 : 49.6 [n=60398] 45.0 : 48.9 [n=32404] 51.5 : 49.5 [n=555] 63.6 : 49.6 [n=176]
15k 47.4 : 49.4 [n=207911] 44.7 : 48.8 [n=70964] 54.0 : 49.3 [n=2123] 63.6 : 49.7 [n=558]
10k 48.9 : 49.4 [n=333649] 45.6 : 49.1 [n=44339] 55.5 : 49.6 [n=1802] 71.4 : 50.3 [n=245]
5k 49.3 : 50.2 [n=83326] 44.2 : 49.9 [n=2287] 61.4 : 50.0 [n=451] 75.0 : 51.4 [n=96]
1d 52.8 : 50.9 [n=4699] 58.3 : 50.1 [n=24] 100.0 : 52.2 [n=3] --
6d 46.7 : 54.1 [n=15] -- -- --
ALL 48.4 : 49.5 [n=713013] 45.0 : 48.9 [n=154474] 54.8 : 49.5 [n=5270] 65.8 : 50.0 [n=1206]
13x13 ranks predicting 13x13 games
30k 44.6 : 49.6 [n=121] 47.5 : 49.9 [n=219] 100.0 : 48.3 [n=1] 50.0 : 47.1 [n=2]
25k 48.0 : 49.5 [n=2773] 45.5 : 49.7 [n=1756] 52.9 : 49.5 [n=104] 32.6 : 48.9 [n=43]
20k 44.0 : 49.3 [n=14530] 45.4 : 49.8 [n=11125] 44.4 : 49.4 [n=941] 42.7 : 49.4 [n=293]
15k 45.8 : 49.3 [n=60647] 46.5 : 50.0 [n=16939] 49.7 : 48.8 [n=2864] 39.7 : 49.1 [n=648]
10k 47.3 : 49.4 [n=85200] 47.8 : 50.3 [n=12933] 59.7 : 49.4 [n=2635] 48.0 : 49.9 [n=306]
5k 49.3 : 50.3 [n=17200] 45.4 : 50.0 [n=866] 58.3 : 49.8 [n=357] 57.6 : 49.9 [n=33]
1d 46.2 : 51.0 [n=249] -- -- --
6d 66.7 : 60.7 [n=3] -- -- --
ALL 46.7 : 49.4 [n=180723] 46.5 : 50.0 [n=43838] 53.3 : 49.2 [n=6902] 42.5 : 49.3 [n=1325]
19x19 ranks predicting 19x19 games
30k 48.4 : 49.6 [n=1504] 42.9 : 49.1 [n=212] 39.2 : 49.4 [n=227] 34.3 : 49.7 [n=67]
25k 45.9 : 49.5 [n=18333] 40.9 : 49.6 [n=4517] 40.0 : 49.7 [n=4083] 41.2 : 49.8 [n=857]
20k 47.3 : 49.5 [n=98224] 42.4 : 49.2 [n=20335] 41.3 : 49.2 [n=14927] 42.2 : 49.3 [n=4369]
15k 49.0 : 49.4 [n=235420] 43.5 : 48.7 [n=40980] 43.3 : 48.6 [n=26238] 45.5 : 48.9 [n=10026]
10k 49.6 : 49.3 [n=344796] 45.0 : 48.6 [n=45871] 46.3 : 48.5 [n=23862] 47.3 : 48.9 [n=9851]
5k 49.7 : 49.5 [n=245545] 44.8 : 48.7 [n=17596] 44.7 : 48.4 [n=8277] 41.8 : 48.4 [n=3881]
1d 49.2 : 49.8 [n=33502] 42.6 : 49.5 [n=2086] 47.5 : 50.0 [n=1100] 47.5 : 50.0 [n=589]
6d 48.9 : 50.6 [n=1585] 54.1 : 49.4 [n=181] 70.0 : 50.3 [n=50] 76.9 : 53.3 [n=26]
ALL 49.2 : 49.4 [n=978909] 43.9 : 48.8 [n=131778] 43.9 : 48.8 [n=78764] 45.0 : 48.9 [n=29666]
Combined rating system
Predicting 9
30k 49.2 : 49.8 [n=17573] 47.8 : 48.6 [n=2060] 49.2 : 49.9 [n=240] 68.0 : 49.9 [n=125]
25k 48.6 : 49.7 [n=43982] 45.5 : 48.7 [n=22370] 56.2 : 49.8 [n=299] 63.0 : 50.5 [n=100]
20k 48.1 : 49.5 [n=117350] 45.1 : 48.4 [n=56551] 53.7 : 49.3 [n=779] 70.5 : 49.6 [n=325]
15k 49.0 : 49.5 [n=262529] 45.4 : 48.1 [n=53642] 57.7 : 49.5 [n=1413] 72.6 : 49.8 [n=332]
10k 50.6 : 49.7 [n=206142] 47.9 : 48.6 [n=14727] 57.1 : 49.7 [n=1120] 71.7 : 50.8 [n=237]
5k 49.9 : 50.2 [n=49161] 43.4 : 49.0 [n=648] 60.5 : 50.6 [n=124] 75.0 : 49.2 [n=12]
1d 54.9 : 50.6 [n=4802] 52.1 : 47.7 [n=73] 100.0 : 51.5 [n=9] 87.5 : 53.4 [n=8]
6d 59.6 : 51.8 [n=171] -- -- --
ALL 49.4 : 49.6 [n=701710] 45.6 : 48.4 [n=150071] 56.3 : 49.6 [n=3984] 70.6 : 50.1 [n=1139]
Predicting 13
30k 49.3 : 49.6 [n=2094] 49.5 : 49.8 [n=2393] 65.6 : 49.0 [n=32] 48.6 : 49.9 [n=37]
25k 47.5 : 49.5 [n=8051] 46.9 : 49.5 [n=8310] 41.6 : 48.8 [n=308] 33.3 : 49.1 [n=33]
20k 46.5 : 49.2 [n=24573] 46.3 : 49.6 [n=20826] 49.8 : 48.6 [n=1222] 46.2 : 49.6 [n=279]
15k 47.6 : 49.2 [n=78278] 47.3 : 49.5 [n=24001] 50.5 : 48.2 [n=2349] 41.1 : 49.5 [n=445]
10k 49.0 : 49.4 [n=86805] 47.9 : 50.0 [n=13165] 57.8 : 48.3 [n=1500] 48.4 : 50.0 [n=182]
5k 50.5 : 50.2 [n=19668] 45.6 : 49.9 [n=1041] 56.2 : 48.9 [n=169] 46.7 : 47.2 [n=15]
1d 54.5 : 50.7 [n=985] 60.7 : 51.2 [n=28] 100.0 : 45.8 [n=4] --
6d 40.5 : 51.8 [n=42] -- -- --
ALL 48.3 : 49.4 [n=220496] 47.1 : 49.6 [n=69764] 52.1 : 48.4 [n=5584] 44.0 : 49.6 [n=991]
Predicting 19
30k 50.5 : 49.6 [n=2299] 44.3 : 48.9 [n=221] 36.9 : 48.5 [n=222] 31.7 : 49.5 [n=63]
25k 48.0 : 49.5 [n=10385] 42.3 : 48.9 [n=3479] 40.6 : 48.9 [n=3215] 38.3 : 49.2 [n=664]
20k 47.7 : 49.4 [n=61028] 41.1 : 48.7 [n=18301] 39.3 : 48.9 [n=15631] 39.9 : 49.0 [n=3557]
15k 48.7 : 49.3 [n=224916] 42.1 : 48.4 [n=48650] 41.1 : 48.6 [n=34843] 41.8 : 48.9 [n=12427]
10k 49.2 : 49.3 [n=410840] 43.1 : 48.3 [n=68206] 43.2 : 48.5 [n=40606] 44.0 : 48.9 [n=15422]
5k 49.6 : 49.4 [n=381050] 44.3 : 48.4 [n=32244] 44.5 : 48.3 [n=15511] 43.1 : 48.4 [n=6854]
1d 49.4 : 49.9 [n=60938] 41.3 : 49.0 [n=3580] 46.5 : 49.2 [n=2010] 44.3 : 49.0 [n=1061]
6d 49.0 : 50.6 [n=2519] 53.2 : 50.0 [n=220] 66.1 : 50.4 [n=59] 84.8 : 52.3 [n=33]
ALL 49.2 : 49.4 [n=1153975] 42.8 : 48.4 [n=174901] 42.2 : 48.6 [n=112097] 42.7 : 48.8 [n=40081]
Combined prediction of all games
30k 49.3 : 49.8 [n=21966] 48.5 : 49.2 [n=4674] 44.7 : 49.2 [n=494] 54.7 : 49.8 [n=225]
25k 48.3 : 49.6 [n=62418] 45.5 : 48.9 [n=34159] 41.9 : 48.9 [n=3822] 41.2 : 49.4 [n=797]
20k 47.8 : 49.4 [n=202951] 44.6 : 48.7 [n=95678] 40.7 : 48.9 [n=17632] 42.7 : 49.1 [n=4161]
15k 48.7 : 49.4 [n=565723] 44.5 : 48.5 [n=126293] 42.2 : 48.6 [n=38605] 42.6 : 48.9 [n=13204]
10k 49.6 : 49.4 [n=703787] 44.5 : 48.6 [n=96098] 44.1 : 48.5 [n=43226] 44.5 : 48.9 [n=15841]
5k 49.7 : 49.5 [n=449879] 44.3 : 48.5 [n=33933] 44.7 : 48.3 [n=15804] 43.1 : 48.4 [n=6881]
1d 49.8 : 49.9 [n=66725] 41.6 : 49.0 [n=3681] 46.8 : 49.2 [n=2023] 44.6 : 49.1 [n=1069]
6d 49.6 : 50.7 [n=2732] 53.2 : 50.0 [n=220] 66.1 : 50.4 [n=59] 84.8 : 52.3 [n=33]
ALL 49.2 : 49.5 [n=2076181] 44.6 : 48.6 [n=394736] 43.1 : 48.6 [n=121665] 43.5 : 48.9 [n=42211]
```

Thereās also the question of whether the traditional idea of rank is even self-consistent.

Like if Alice give four stones to Bob, and Bob gives four stones to Charlieā¦ does Alice necessarily give eight stones to Charlie?

From what I have seen, the answer to that question is usually āyesā (with some margin of error and statistical noise).

I suppose it will break down when trying to compensate a very large rank gap by a very large handicap, but as long as the gaps and handicaps are not too large, one accounts for the half-rank bias of traditional handicap and ranks are assigned properly, calculated handicaps work pretty well to even out winning chances.

When you have a group of players (like a go club) that only play handicap games with each other, and adjust mutual handicap depending on game results, they will eventually settle on a fairly stable and consistent matrix of mutual handicaps. With that matrix, you can rank them relative to each other.

Something like that was how ranking worked before the advent of the internet, and I think many go clubs still do. Even some go servers may still work something like that.

One reason Iāve long advocated for handicaps on rather than off by default on OGS.

Bear in mind that even without handicap, one can ask whether rating is self-consistent:

There could be 3 players with a rock-paper-scissors relationship.

(A consistently beats B, B consistently beats C, C consistently beats A)

ā

ā

Even if there is such a thing, we still try to put the players

on a linear scale, with as much accuracy as that allows.

How can OGS match players of different strength so that both black and white have a decent chance of winning?

I think this would be an interesting question to explore.

I prefer reverse komi to handicap stones. I think they make more sense and still allow for a ānormalā game to be played

I think such situations are quite rare, but if they are completely balanced and they play with nobody else, I think they should have the same rating. What would be wrong with that?

When aiming for an expected winrate range (like 25-75%), you can search for players with ratings within some Elo distance. That wonāt be too hard.

With Glicko2 you may need a slightly more complicated search with more parameters. I donāt know what those calculations should look like.

Reverse komi handicap has been discussed before. I think reverse komi games would just fit in the handicap system when you use 2 times perfect komi per full handicap stone.

Thatās one answer to my question, but it imposes a limit on the difference in player strength and leads to questioning, whether such a limit is necessary.

Without handicap that should be pretty rare, I agree.

In our local club we had our own ranking system between 4 regulars, which worked fine. It was somewhat stable with some fluctuations and when we played small tournaments everyone had a chance to win. When one player left it collapsed though. It was still stable, but too much so, because we always had that A>B>C>A situation. The reason was we were 2 players about 5 ranks apart and the 3rd about 10 ranks lower. I was the middle one, but I was ābetterā at trying unusual (read: nonsensical) stuff against the large handicap (on 9x9 or 13x13). So I alway won with white and always lost with black against the stronger player, who again, with his reasonable moves was helpless against the even larger handicap. In hindsight it is possibly ever more surprising that it worked well with 4 player, where number 4 was still about one stone weaker.

(Edit: many embarassing typos fixed. Enjoy the remaining ones.)

so that both black and white have a decent chance of winning?

but it imposes a limit on the difference in player strength

I thought you were asking about finding opponents for an even game, where the expected winrate is not too far from 50%. So I proposed to look for opponents that are within some range of players strength. But now it seems that this is not what you want.

To me this seems like a paradox, so I suppose I misunderstood your first question.

Iām not requiring even games (in the sense of no handicap). And I just wanted to give an example question for what this thread could focus on, because I think my initial post (which originally belonged to a different thread) doesnāt do a good job at eliciting a fruitful discussion.

Currently OGS allows players to find an opponent within some rank range.

I suppose this could be modified to find an opponent within some expected winrate range, where handicap can be allowed as an option.

So when you select a 30-70% range, the system might come up with a 3 stone game against an opponent who is 5 ranks stronger.

Or you select a 80-90% range (because you like to win) and the system might come up with a game where you get 2 stones handicap against an opponent who is 2 ranks weaker.

Do you mean something like that?

Thatās an interesting idea, give the users the option to decide themselves what they think is āa decent chance of winningā.

Another option would be to say āonly trying to get as close to a 50-50 chance of winning is decentā. And then ā as an example ā weād maybe need a komi of 3.5 (if white is just slightly stronger than black).

Or you select a 80-90% range (because you like to win) and the system might come up with a game where you get 2 stones handicap against an opponent who is 2 ranks weaker.

Do you mean something like that?

I see no problem with negotiating an appropriate handicap for a game, but there must be reasonable limits. Surely, @gennan, you were being sarcastic?

Then again, I donāt have a standard of āappropriateā that is easily coded.

Surely, @gennan, you were being sarcastic?

Not really, but I suppose that if you select 80-90% range youā'll probably have to wait a while for an opponent who accepts a game with them having 10-20% win expectation.

OGS starting a subs and doms matchmaking service?

So far itās just @gennanās interpretation of what I said. Neither he, nor I are advocating it, letās keep that in mind.