Site suggestion : 9x9 handicaps above 1 seem to favour Black too much, consider adjusting the 9x9 handicapping system

Hello, recently after playing various auto-handicapped 9x9 teaching games and tournaments, I have been wondering whether the OGS 9x9 handicapping system is inaccurate.

I feel the 9x9 handicapping system favours Black strongly.

Another dan player, who only plays 9x9 for the most part as their specialty, has also mentioned that the 9x9 handicaps feel equivalent to 6 stones or more on a 19x19, when in games as White with Black receiving handicap stones.

(for example, something like 3d vs 4k with 2 stones seemed too much to them as well)

See the discussion in these posts :

Is there a better handicap system adjustment, for example one of the systems giving one handicap per 6 or 7 ranks in this post ?

This would also be relevant if the rank restrictions for handicap games (less than 9 ranks of difference) became removed as in the recent proposition thread linked below, as the current handicap on 9x9s would very likely skew the rating of a player taking Black (if they are able to win against much more highly rated players in ranked games than they would on other board sizes).

(and potentially hinder beginners in getting a proper rank by inflating it too much ā€“ also relevant to that proposition made to help beginners, as beginners and DDK players may play 9x9 much more often by default, often taking Black as the weaker player)

And it would seem to be relevant to overall ranking accuracy also, as all tournament games, even with a large handicap on 9x9 above the usual handicap for >9 ranks of difference, are rated.

(Some groups I am a part of sometimes attempt to host 9x9 handicap tournaments. They often include beginner and DDK players ranging from 25k-10k as well as some stronger players, and from what I have seen when there is a more sizeable handicap of 2-3+ stones, itā€™s seemed to favour Black heavily.

After realising this, in one group, weā€™ve dropped all auto-handicap 9x9 tournaments. So, this makes it difficult to create interesting/evenly handicapped 9x9 tournaments for those who wish to. )

5 Likes

there is no need for handicap on small boards in ranked games

1 Like

More here

2 Likes

there are problems because correlation between rank on 9x9 and 19x19 is not perfect
trying to add auto handicap on 9x9 only leads to additional problems

I have a proposal Iā€™m planning to clean up later this month (Iā€™m currently travelling) which I think will address this. The current small board handicaps, intended to match what Senseiā€™s library calls ā€œthe old Japanese recommendationā€, do favour black. Thereā€™s also a long-standing bug which makes the reverse komi even bigger than the intent.

In December, I had a quick chat with @anoek and he agreed something needs to change; once I clean up the proposal I think we can make the right thing happen.

2 Likes

By ā€œthe right thingā€ I mean we can get closer to something fairā€¦ it may not be 100% perfect but it wonā€™t be hard to be substantially better than the current state.

4 Likes

Cool! ^^

Another improvement could be to cap the 9x9 auto-handicap in tournaments at a smaller number of stones, as handicaps on 19x19 are capped to 9 stones even in tournament games with a larger difference in ranks.

However on 9x9, we had a tournament game with 6 stones (which should be equivalent to a handicap for around 24-42 ranks of difference, depending on the system used). :sweat_smile:

2 Likes

Yeah, we should cap the handicap for ranked games at an ā€œeffective rank differenceā€ of 9, to match 19x19.

2 Likes

This makes me wonder, on a side note, whether the 13x13 auto handicaps for tournaments are also un-capped at the moment.

I seem to recall playing in a 13x13 tournament with auto-handicaps and large rank gaps in which this was the case as well.

Edit, here (see the games between myself and the other dan player, and the DDKs) : go go!

I guess black with 2 stones handicap on 9x9 has much bigger advantage in game between two dans than in game between two kyu

This is the kind of feeling which i completly discard on a big board, but maybe youā€™re right on a small one. Limited choices and opportunities.

1 Like

Do you mean limiting the handicap in 9x9 tournaments to just maybe 2 or 3 stones? Seems like that kind of thing would be great as an option but why stop people from setting up the tournaments to have big handicaps if they want to? @fuseki3, you won all the games that you played in that 13x13 tournament including one at 10 stones! (I didnā€™t even know that was possible and have updated the wiki page.)

If you are talking about limiting the options for ranked challenges, I would love to see it move in the opposite direction and allow any combination of komi and handicap. Whatever you think the ideal system is, some people are going to want to play games differently, and why not allow the results of their games to feed into the ranking system?

If you think thereā€™s not much value in the result of a 10-stone 13x13 game, you could quietly reduce its effect, but I donā€™t see why you would want to assert pre-emptively that such a game has to be called ā€œunrankedā€ and excluded from all future calculations.

3 Likes

1k can use Tow handicaps win 9d on 9x9 100%

Hum. I donā€™t get it. You may try more difficult as it should be ok to be rated even if you get less as you could but what if it is less difficult as it should be? How to determine a level of difficulty when all vary, balancing between komi and handicap if you donā€™t want to restrict and then give opportunities to get easier games as it should be played?

Actually, I donā€™t have a strong opinion of what the limit should be for a ranked game, but I do think the limit should match between 19x19 and small boards.

I think the argument for having a limit at all is that if the handicap advantage is too big, we donā€™t trust that we understand its meaning and the game result is potentially more random than based on player strength. As a result, using those games to update ratings would be counterproductive. (Maybe thereā€™d be room to call the game ā€œrankedā€, thereby allowing the configuration in tournaments, but then have the rating system actually ignore itā€¦)

Currently the limit on large boards is a handicap of 9 stones; IMO, ranked games on small boards should cap out at the same effective rank difference.

If thereā€™s an independent decision that we allow ranked games with larger handicaps on 19x19 then I think increasing the cap on small boards as well would be consistent.

4 Likes

About the 13x13, it wasnā€™t something I had a strong feeling for (the 13x13 handicaps and how they work) as I havenā€™t played much 13x13 handicap, though more an observation based on keeping the handicapsā€™ caps consistent, the observations that perhaps more handicaps become more and more difficult especially on smaller boards (with less room overall as each stone is added), and the reasons @dexonsmith mentioned.

Itā€™s also probably worth noting that in that tournament, for the purposes of being useful data, of the 5 games I did attempt to play, 1 was won by annul (the opponent resigned immediately), and 1 was won because my opponentā€™s life became too busy and so they resigned very early in the game whilst it was more or less even, as they wouldnā€™t be able to continue.

(and they had initially played the 1-1 point 2 times, to try to make it equivalent to having passed 2 times, so that the 2 stone game could be played as an even game, so I donā€™t have much sense of what the 2 stones handicap would have been like ^^)

Of the 3 remaining (0,5 komi with a 3d, 10 stones with a 24k, and 3 stones with a 5k) games which were played out, the handicap didnā€™t feel impossible, but it felt very easy/possible to lose in the 10 stone game with so many stones creating a fairly tight, big moyo across the whole board, and very possible in the 3 stone game also.

(especially in the 10 stone game, in which I should likely have lost by around 20-30 points before my opponent made a really huge blunder ā€“ not connecting a key group of stones in atari in the very, very late yose when borders were nearly closed :sweat_smile: )

(while waiting for a blunder for a while through the midgame of the 3 stone game, I hadnā€™t found a good way to decrease the handicap advantageā€“it seemed easily possible to die on a large scale somehow or other when creating the fights, with the 3 handicap stones, if Black played some key moves to work with them globally, but the game didnā€™t feel impossible, or necessarily unevenly handicapped either.

ā€“eventually Black made a big blunder equivalent to passing in the midgame fighting when I think they could have easily killed a big group by simply taking a big global point in the remaining open area, then another move somewhat similar to passing, which allowed me to attack a big key group.

I am not sure how I feel about the outcome and handicap, given the 2 big pass-equivalent blunders, and I felt like I didnā€™t have enough experience as White with 13x13 handicap either way. ^^)

The 0,5 komi game was very close and difficult. ^^

In any case, it might not be the best example or a terribly big sample size if looking at 13x13 auto handicapping in general based on only 3 games. :sweat_smile:

As a side note, I didnā€™t know this either.

It was interesting to try. :smile:

I think different handicap and komi combinations could be fun, and even if unranked, perhaps it would be possible to add an option to tournaments like ā€œAllow effective number of handicap stones for rank differences >9ā€, with a hover ā€˜ā€™?ā€˜ā€™ text showing "games with an effective handicap of greater than 9 ranks difference will be unratedā€™'.

2 Likes

Exactly. And custom games, challenges, etc. The back end has that feature already at least with non-standard board sizes. There are other times we might want to ignore the result, like when itā€™s an even game between 3d and 23k, or if the komi is extreme.

You might think that 3 or more stones on 9x9 should not affect rank now, but if later we decide to make the limit 4 or 5, or take those games into account with a smaller effect, we can revise the system to include those games. If we disallow calling them ranked, that blocks us from using the results forever.

2 Likes

But also, I think that cutting off the ranking effect so soon for 9x9 would be misguided. I am only 8k but can destroy a total beginner at 3 or 4 stones on 9x9. So I would start with 5, and over a few games if they start winning, move up to 4 or 3.

I remember my son moving up from 3 to 2 against me. It was a notable step, not at all ā€œcounterproductiveā€ to try to measure.

Iā€™m curious what your proposal will be!

Some thoughts: we know the value of an early move is about 12-14 points (2x fair komi) on all board sizes people normally play (9x9, 13x13, 19x19). So a reasonable choice for the ā€œequivalent stone advantageā€ A of a given setup for black should be:

A = S - (Komi - 6.5) / 13,

where S is the number of extra stones Black got beyond what they would have in an even game playing first (e.g. 3 in a ā€œ4 stone handicapā€ game).

Let R the ā€œeffective rank advantageā€ be the number of ranks by which black should be weaker for that handicap/komi combination A to give a roughly fair game. We know that in a 19x19 game, ranks are calibrated/defined with the intent that R = A, but in general for smaller board sizes we should have R = A * f(size) where f is some scaling factor.

Then itā€™s up to choosing the scaling factor f, which seems pretty empirical. I would guess something like 2 for a 13x13 and about 4 for a 9x9. (e.g. a 4-5 stone game on 13x13 seems about right for players 9 stones apart on 19x19, and maybe like 2-3 stones for 9x9).

If we naively assume that the strong player catches up at a certain rate in points per move on average, then the above scaling factors of 2 and 4 are also consistent with 13x13 games lasting about half as many moves and 9x9 games lasting about a quarter as many moves. But these factors can also be adjusted as needed.

These are mostly the same thoughts as when I implemented a custom bayesian Elo rating system for a little competitive Go playing league with a few dozen people at my workplace a few years back, which included both handicap games and a mix of 13x13 and 19x19 games. It worked pretty well.

4 Likes

I would also be interested to know what the ranking system does currently. Is it somehow based on the same weird formula as the automatic handicap calculation? Does it look at komi at all?

Those two things donā€™t need to be exactly the same. For example I would suggest that the automatic handicaps be calculated in some way thatā€™s easy for a human to understand and check, while the ranking system would use high precision calculations and decimal ranks.