I hosted a Swiss 9x9 correspondence handicap tournament to celebrate the new ranking system in the beginning of 2021, and we finished it up a few days ago. While 9x9 handicap games can often seem ridiculous, and the handicap formula we use doesn’t make much sense, I wanted to know if we could still have a fun tournament and maybe even learn something from the results. In this post I’ll try to show that the answer to both questions is yes.
First, some basic statistics:
Players: 45
Rounds: 11
Games: 165
Range of ranks seen (measured at the start of the match): 24k to 6d
Range of rank differences in matches: 0.02 to 24.4
Players who made it to the end: 23
I was really happy to see so many players of very different ranks keep up with this tournament for such a long time! Thanks and congratulations to all of you!
handicap | games played | Black wins |
---|---|---|
8** | 1 | 1 (100%) |
7 | 2 | 2 (100%) |
6 | 2 | 2 (100%) |
5 | 1 | 1 (100%) |
4 | 5 | 4 (80%) |
3 | 15 | 10 (75%) |
2 | 25 | 13 (52%) |
1* | 83 | 32 (39%) |
even | 31 | 16 (52%) |
total | 165 | 80 (49%) |
* A handicap of 1 just means black goes first as usual, but with reduced komi.
** How did we get handicaps of 7-8 without rank differences over 25? I strongly suspect there was a server bug in the handicap calculations that got fixed before round 2.
Overall the results show Black winning about half of the games, which is what we would want for a fair handicaps. But beyond 2 or 3 stones, where there were very few games played, it seems like it is almost hopeless for White. The current formula gives a stone for every 3 or 4 ranks, which is apparently too much for very large rank differences. On the low end, this effect is balanced by a quirk with 1-handicap games - their reduced komi of 3.5, down just two points from the default of 5.5, is probably not enough compensation for the rank difference, which ranged from 1.01 to 4.99. So from this table it seems like it might be better to reduce the komi at handicap 1 and also give fewer stones for a given rank difference.
Analysis
I did a bunch of detailed analysis with log probabilities and such on this set of games, but you can get most of it from observing that the handicap-2 games seemed to be approximately balanced, at least on average. The average rank difference of these games, which also happen to have a normal komi of 0.5, was about 6.75. Remember that the first handicap stone really just stands for reduced komi. If we assume that ideal komi is 7, we have the following equation:
1 stone + 6.5 komi = 6.75 ranks.
Since 1 full stone is worth twice the ideal komi,
1 + 6.5/14 stones = 1.46 stones = 6.75 ranks.
And we conclude
1 stone = 4.6 ranks.
That is, overcoming a handicap stone in 9x9 is about 4.6 times more difficult than it is in 19x19, with difficulty measured in ranks.
Fancy graph from the log probabilities that was way more work
* For this particular graph I excluded games decided by timeout and also all players who timed out before round 9.
Technical notes about getting ranks
I needed the ranks at the beginning of the games to do the analysis here, but ranks can change over the course of a long correspondence game, so I pulled these ranks from the player rating history API. For a few games by players who play extremely high numbers of games, I failed to download the rating before it disappeared from their history and had to make do with the rating recorded permanently at the end of the game instead.
Application of results
So, how can we use this equation to make handicap 9x9 games more fair? Let’s consider for example two players separated by 13 ranks, who would play at 4 stones and negative 5.5 komi under the current system. That’s an effective difference of 3 + (5.5 + 7)/14 = 3.9
stones, for players who are probably just 13/4.6 = 2.8
stones apart in 9x9 skill, giving more than a full stone of advantage to Black. If instead they played at 3 stones and 0.5 komi, the advantage would be about 0.4 stones to White.
I don’t want anyone to take my exact numbers too seriously yet, but if 4.6 turns out to be about right, here’s how we could build a more optimized and normal-looking handicap table like this:
Possible handicap table
rank difference (rounded down) |
handicap | komi |
---|---|---|
0 | 0 | 5.5 |
1 | 1 | 0.5 |
2 | 1 | 0.5 |
3 | 1 | 0.5 |
4 | 2 | 0.5 |
5 | 2 | 0.5 |
6 | 2 | 0.5 |
7 | 2 | 0.5 |
8 | 2 | 0.5 |
9 | 3 | 0.5 |
10 | 3 | 0.5 |
11 | 3 | 0.5 |
12 | 3 | 0.5 |
13 | 3 | 0.5 |
14 | 4 | 0.5 |
15 | 4 | 0.5 |
16 | 4 | 0.5 |
17 | 4 | 0.5 |
18 | 5 | 0.5 |
19 | 5 | 0.5 |
20 | 5 | 0.5 |
21 | 5 | 0.5 |
22 | 5 | 0.5 |
23 | 6 | 0.5 |
24 | 6 | 0.5 |
25 | 6 | 0.5 |
26 | 6 | 0.5 |
27 | 7 | 0.5 |
28 | 7 | 0.5 |
29 | 7 | 0.5 |
30 | 7 | 0.5 |
And here’s what the balance of the games would look like under the new system:
Looking at it this way, our current system, while it could be better, is actually okay for a lot of possible games. So I think you can already have a lot of fun with auto-handicap 9x9, but you might want to check this graph if you get a game that seems strangely unbalanced.
What do you think, does this make any sense? Does 4.6 ranks per stone “feel” right? Is anyone interested in trying a 9x9 tournament under a revised handicap table like this? Or in trying to do some similar analysis across more 9x9 games on OGS, or with 13x13 or other sizes?