Analysis of a 9x9 handicap tournament

Feijoa · November 6, 2022, 4:21am

I hosted a Swiss 9x9 correspondence handicap tournament to celebrate the new ranking system in the beginning of 2021, and we finished it up a few days ago. While 9x9 handicap games can often seem ridiculous, and the handicap formula we use doesn’t make much sense, I wanted to know if we could still have a fun tournament and maybe even learn something from the results. In this post I’ll try to show that the answer to both questions is yes.

First, some basic statistics:

Players: 45
Rounds: 11
Games: 165
Range of ranks seen (measured at the start of the match): 24k to 6d
Range of rank differences in matches: 0.02 to 24.4
Players who made it to the end: 23

I was really happy to see so many players of very different ranks keep up with this tournament for such a long time! Thanks and congratulations to all of you!

handicap	games played	Black wins
8**	1	1 (100%)
7	2	2 (100%)
6	2	2 (100%)
5	1	1 (100%)
4	5	4 (80%)
3	15	10 (75%)
2	25	13 (52%)
1*	83	32 (39%)
even	31	16 (52%)
total	165	80 (49%)

* A handicap of 1 just means black goes first as usual, but with reduced komi.

** How did we get handicaps of 7-8 without rank differences over 25? I strongly suspect there was a server bug in the handicap calculations that got fixed before round 2.

Overall the results show Black winning about half of the games, which is what we would want for a fair handicaps. But beyond 2 or 3 stones, where there were very few games played, it seems like it is almost hopeless for White. The current formula gives a stone for every 3 or 4 ranks, which is apparently too much for very large rank differences. On the low end, this effect is balanced by a quirk with 1-handicap games - their reduced komi of 3.5, down just two points from the default of 5.5, is probably not enough compensation for the rank difference, which ranged from 1.01 to 4.99. So from this table it seems like it might be better to reduce the komi at handicap 1 and also give fewer stones for a given rank difference.

Analysis

I did a bunch of detailed analysis with log probabilities and such on this set of games, but you can get most of it from observing that the handicap-2 games seemed to be approximately balanced, at least on average. The average rank difference of these games, which also happen to have a normal komi of 0.5, was about 6.75. Remember that the first handicap stone really just stands for reduced komi. If we assume that ideal komi is 7, we have the following equation:

1 stone + 6.5 komi = 6.75 ranks.

Since 1 full stone is worth twice the ideal komi,

1 + 6.5/14 stones = 1.46 stones = 6.75 ranks.

And we conclude

1 stone = 4.6 ranks.

That is, overcoming a handicap stone in 9x9 is about 4.6 times more difficult than it is in 19x19, with difficulty measured in ranks.

Fancy graph from the log probabilities that was way more work

* For this particular graph I excluded games decided by timeout and also all players who timed out before round 9.

Technical notes about getting ranks

I needed the ranks at the beginning of the games to do the analysis here, but ranks can change over the course of a long correspondence game, so I pulled these ranks from the player rating history API. For a few games by players who play extremely high numbers of games, I failed to download the rating before it disappeared from their history and had to make do with the rating recorded permanently at the end of the game instead.

Application of results

So, how can we use this equation to make handicap 9x9 games more fair? Let’s consider for example two players separated by 13 ranks, who would play at 4 stones and negative 5.5 komi under the current system. That’s an effective difference of 3 + (5.5 + 7)/14 = 3.9 stones, for players who are probably just 13/4.6 = 2.8 stones apart in 9x9 skill, giving more than a full stone of advantage to Black. If instead they played at 3 stones and 0.5 komi, the advantage would be about 0.4 stones to White.

I don’t want anyone to take my exact numbers too seriously yet, but if 4.6 turns out to be about right, here’s how we could build a more optimized and normal-looking handicap table like this:

Possible handicap table

rank difference (rounded down)	handicap	komi
0	0	5.5
1	1	0.5
2	1	0.5
3	1	0.5
4	2	0.5
5	2	0.5
6	2	0.5
7	2	0.5
8	2	0.5
9	3	0.5
10	3	0.5
11	3	0.5
12	3	0.5
13	3	0.5
14	4	0.5
15	4	0.5
16	4	0.5
17	4	0.5
18	5	0.5
19	5	0.5
20	5	0.5
21	5	0.5
22	5	0.5
23	6	0.5
24	6	0.5
25	6	0.5
26	6	0.5
27	7	0.5
28	7	0.5
29	7	0.5
30	7	0.5

And here’s what the balance of the games would look like under the new system:

Looking at it this way, our current system, while it could be better, is actually okay for a lot of possible games. So I think you can already have a lot of fun with auto-handicap 9x9, but you might want to check this graph if you get a game that seems strangely unbalanced.

What do you think, does this make any sense? Does 4.6 ranks per stone “feel” right? Is anyone interested in trying a 9x9 tournament under a revised handicap table like this? Or in trying to do some similar analysis across more 9x9 games on OGS, or with 13x13 or other sizes?

benjito · November 6, 2022, 5:21am

Very interesting analysis!

Reading the OGS version of the Old Japanese handicap system is very strange. The komi seems almost nonsensical - is it really true that komi is not adjusted independent of handicap stones??

I feel gradually cycling komi would go a long way in decreasing the peaks that are in your graphs.

Given the 100% winrate above 4 stones, it definitely seems clear the current system should have a higher ratio of rank to stones. Would be very interesting to see what percentages end up like with the new system.

hoctaph · November 6, 2022, 6:31am

I believe this site’s AI review estimates that the ideal 9x9 komi is 6.
What estimate for 1 stone do you get if you use that instead of 7?

You mentioned the difference for handicap 1 , and it’s far more substantial there, but
does your calculation also account for the advantage from handicap 2 and handicap 3
actually being 1.5 stones minus 0.5 points and 2.5 stones minus 0.5 points
respectively, rather than 2 and 3 stones, and similarly for larger handicaps?

gennan · November 6, 2022, 11:03am

AGA uses 4 ranks per stone, BGA uses 7 ranks per stone, I use 6 ranks per stone in my youth club.
I feel that 4.6 ranks per stone is better than 4 ranks per stone, but perhaps the ratio needs to be higher still. (also see 2021 Rating and rank adjustments - #528 by gennan)

It is related to the question of which rank you assign to a raw novice. I (3d EGF) can typically give 8 stones handicap on 9x9 to a raw novice 8 year old, to who I assign a rank of ~42k, i.e. a gap of about ~44 ranks.

With 4.6 ranks per stone, 8 handicap would correspond to a gap of ~33 ranks, so a typical raw novice 8 year old would then have a rank of ~30k. I think that somewhat overestimates their level, but it’s not so easy to verify this by measuring actual rank gaps with handicap games on 19x19.

Feijoa · November 6, 2022, 4:48pm

Thanks for reminding me about the other systems. It’s nice that 4.6 at least falls within the range of accepted possibilities. I wonder if ranks on OGS might be artificially compressed by the way we start beginners at 6k, de-emphasize handicap games, and don’t have much support ranks below 25k. There are also unique challenges to online and correspondence play, such as having a bad day or a bad network connection, that might be independent of board size. And the better correspondence players might choose to dedicate more than their usual amount of attention to a handicap 9x9 game since it’s an unusual setting.

That’s true and baked kind of deeply into the OGS code, so it might be hard to change right away.

I think some great first steps would be allowing ranked games with custom komi, and then allowing tournaments (and ladders?? ) to specify handicapping formulas other than the built-in auto.

It’s not very dependent on the ideal komi. The formula is just 6.75/(1+(k-0.5)/(2k)), so you can try plugging in your own numbers. That would come into play more when trying to construct an optimal handicap table.

I think I’m correctly taking into account the meaning of 1 stone and the komi changes correctly in the graphs later, but please let me know if you see something wrong!

Jhyn · November 6, 2022, 8:09pm

What is going on with the komi? It looks like if someone mixed and matched two different systems. Hopefully noone actually plays with those settings.

As an EGF 2d, I often teach adult beginners (~30k ? with a lot of variance). If I give 4 stones (no komi), I win a large majority of the time; a beginner usually needs between 3 to ~20 games to beat me with 4 stones. With 5 stones, I’d say it’s closer to 50%. I don’t think I’ve ever played with more than 5, but I’ve not played children. Obviously I’m not trying to win really hard so that’s an underestimate…

Since you asked, if I had to guess, I would have said 5.5 ranks per stone. I wouldn’t be surprised if it’s not linear at all (depending on the level and time control more so than on larger boards).

I’d play a 9x9 handicap tournament.

_KoBa · November 6, 2022, 8:34pm

I guess the handicap was counted when each round started, so some ppl at top ranked up and/or some near bottom rnaked down at some point? And the big hc games prolly happened in latter rounds when there was fewer people with the same swiss scores?

It actually looks surprisingly even, seems like pretty close to that theoretical “43% for black” winrate which was somewhat expected for the current system ^^ So few samples for 4-8 stone range that its hard to say anything about those, maybe another tournament is needed?

Feijoa · November 6, 2022, 9:58pm

Yes, ranks certainly changed a lot during the ~1.5 years that we were playing. But actually I think the biggest handicaps happened at the beginning, probably due to a server bug. I never got confirmation, but I suspect that after we had visibly switched over to the new ranking system, the old ranks were still lurking around in the database for a while and getting used for things like auto handicap games. By the second or third round everything seemed to have been straightened out.

So yes, a new tournament is definitely needed. Isn’t it always? But how can we set it up to generate a lot of really big handicaps? I was thinking of some kind of manually-organized ladder with a handicap system that gradually adjusts over time to keep it balanced.

benjito · November 6, 2022, 10:09pm

Can you make use of the OpenGotha integration? I don’t know much about it, but it seems like a lot of customizability is available through that. Perhaps @Leira can help us figure out what is possible.

Leira · November 6, 2022, 11:37pm

Using OpenGotha as is isn’t really very hard. Basically you just download the program and follow the instructions here OpenGotha Tournaments · online-go/online-go.com Wiki · GitHub

However, there are a few caveats. I’ve run into quite a few bugs and also, let’s say, design choices that seem arbitrary. It’s in beta, so you can’t really expect it to be super smooth.

On the other hand, if you’re willing to just dig into the xml and “abuse” the system for extra functionality, there are some limitations that you can ignore.

I’ll tell you what I know, if you want to give it a try.

david265 · November 7, 2022, 3:12pm

While I’m not smart enough to understand this entire discussion, I do hope that this good observation and analysis will be used to improve OGS, since I only play 9x9 and I would like to be able to play a wider range of ranks with a fair handicap than is possible at present.

Leira · November 7, 2022, 7:08pm

I have not read the full thread, but I disagree with the linearity of the premise, though it is certainly an improvement over those crazy handicaps you mentioned.

In such a small board as 9x9, each subsequent stone is probably worth more than the previous one, until you reach a cap. Such behavior would be more akin to a logistic function.

I have no proof for that statement though. Only intuition.

For example, I don’t think anyone should lose with 9 handi, even against KataGo, unless they play essentially random moves. Any subsequent stone adds pretty much nothing. But even 5 stones already take alll the key points of the board. I don’t think you gain all that much from 5 to 9 either.

On the other hand, the difference between 2 and 3 stones seems huge. 2 stones are one tactical mistake away from being “even”, wereas 3 stones already cover 3 corners, pretty much destroying the conventional strategy of making 2 living groups.

I’d argue that from 2 to 3 there’s almost as much difference as from 5 to 9, if not more.

Feijoa · November 27, 2022, 6:24am

I realized I should at least ask the obvious question: can you set handicaps in OpenGotha?

Leira · November 28, 2022, 12:30am

You certainly can. Something to do with the “+” and “-” keys. Actually, for some reason handicaps are given by default, and if your players have different ratings (within the program) it might put handicap stones in your games without you noticing, so watch out for that.

Feijoa · November 28, 2022, 12:39am

That’s exciting, since if that really works it means we can create a tournament with whatever handicap formula we want. What about komi, is that adjustable too?

rank difference (rounded down)	handicap	komi
0	0	5.5
1	1	0.5
2	1	0.5
3	1	0.5
4	2	0.5
5	2	0.5
6	2	0.5
7	2	0.5
8	2	0.5
9	3	0.5
10	3	0.5
11	3	0.5
12	3	0.5
13	3	0.5
14	4	0.5
15	4	0.5
16	4	0.5
17	4	0.5
18	5	0.5
19	5	0.5
20	5	0.5
21	5	0.5
22	5	0.5
23	6	0.5
24	6	0.5
25	6	0.5
26	6	0.5
27	7	0.5
28	7	0.5
29	7	0.5
30	7	0.5

rank difference (rounded down)	handicap	komi
0	0	5.5
1	1	0.5
2	1	0.5
3	1	0.5
4	2	0.5
5	2	0.5
6	2	0.5
7	2	0.5
8	2	0.5
9	3	0.5
10	3	0.5
11	3	0.5
12	3	0.5
13	3	0.5
14	4	0.5
15	4	0.5
16	4	0.5
17	4	0.5
18	5	0.5
19	5	0.5
20	5	0.5
21	5	0.5
22	5	0.5
23	6	0.5
24	6	0.5
25	6	0.5
26	6	0.5
27	7	0.5
28	7	0.5
29	7	0.5
30	7	0.5

rank difference (rounded down)	handicap	komi
0	0	5.5
1	1	0.5
2	1	0.5
3	1	0.5
4	2	0.5
5	2	0.5
6	2	0.5
7	2	0.5
8	2	0.5
9	3	0.5
10	3	0.5
11	3	0.5
12	3	0.5
13	3	0.5
14	4	0.5
15	4	0.5
16	4	0.5
17	4	0.5
18	5	0.5
19	5	0.5
20	5	0.5
21	5	0.5
22	5	0.5
23	6	0.5
24	6	0.5
25	6	0.5
26	6	0.5
27	7	0.5
28	7	0.5
29	7	0.5
30	7	0.5