Jigo implementation? ✅

GreenAsJade · December 31, 2023, 9:20pm

I tend to agree that we shouldn’t advertise “NZ rules”.

Affirmative action maybe? Someone didn’t think to add a line of code to randomize it?

Maharani · December 31, 2023, 9:21pm

Draws are absolutely a possible outcome by moderator decision, just not by default by the scoring system. Glicko2 as implemented on OGS handles them correctly, too.

Groin · December 31, 2023, 9:21pm

I m curious, i was expecting white.

Maharani · December 31, 2023, 9:24pm

For example: 8 x 8

Groin · December 31, 2023, 9:27pm

Then you could not give the choice of this rules in tournament games and meanwhile solve the question first for classic rated games. Step by step.

@Maharani maybe it was explained already but as reminder what is a fair result from a jigo in NZ rules? 100% full win points for both? 50% ? 0 % ?

GreenAsJade · December 31, 2023, 9:50pm

Hah - very interesting!

It’d be way more instructive to see what happens if:

The game is ranked

or

The game is in a tournament

Groin is right - I guess we could not offer jigo rulesets in places where that’d be a problem

Maharani · December 31, 2023, 9:53pm

Here you go 9 x 9

This game also showcases another bug that has its own thread: ranked NZ games for 9 x 9 and 13 x 13 use 5.5 komi instead of 7.0

Groin · December 31, 2023, 9:56pm

Nice. How was this game used by the rating process ? Better as an annulation ?

That case is serious we are with a ranked game.

dexonsmith · December 31, 2023, 10:02pm

Most komi on small boards are off/“interesting”, especially for handicap games, but also for even games. I have a half-written proposal for how to fix them, and will probably have time to finish it off and share sometime in January.

Note that NZ rules on 19x19 correctly uses komi of 7 and 0 (draws allowed even and handicap). It’s just small boards where it’s wrong.

Maharani · December 31, 2023, 10:08pm

Yep, as I said Here’s the thread I mentioned: Please fix default NZ rules komi for ranked 9x9 and 13x13

GreenAsJade · December 31, 2023, 10:08pm

It wasn’t a ranked game.

Maharani · December 31, 2023, 10:09pm

This last one I linked was ranked, according to the game information: 9 x 9

Here’s a ranked 19 x 19 tie: 19 x 19

Nickj · December 31, 2023, 11:18pm

what is a fair result from a jigo

Obviously the mean of what a loss or a win would give. As in most games. It’s correctly handled by the scoring systems like glicko2 (which were developed for chess, where draws are common).

GreenAsJade · December 31, 2023, 11:18pm

What actually is supposed to happen in a tie, to ratings?

I guess that if I tie with a Dan my rating should improve, and theirs worsen?

In the two linked rated tied games, the ranks change a teeny bit

Nickj · December 31, 2023, 11:29pm

Integer komi, allowing the possibility of draws as a result of equal count, makes sense in most scoring systems. NZ rules happen to specify the komi; some other rulesets leave it undefined - so some tournaments use 6.5, some use 7 (allowing draws by equal count), and some use 7.5.
Rather than saying “integer komi doesn’t work for knockout tournaments”, I’d argue that knockout tournaments don’t work. A knockout tournament really only decides first place; the losing finalist could be weaker than half the players in the tournament, so it doesn’t make sense to give him/her the second (or any) prize. Swiss or McMahon tournaments are much better.
Note, by the way, that draws other than by equal count are possible in most rulesets. A scoring system which doesn’t treat draws as draws is broken.

Groin · January 1, 2024, 1:20am

It was (see the game information)

GreenAsJade · January 1, 2024, 2:59am

Hah - I lost track of which games was “it”

There are certainly two linked games that are ranked, and the question is whether the rank calculation for them was what we’d expect.

That being said, overall it’s confirmed that basic support seems to be there, so if we took jigo rulesets out of tournaments and ladders, maybe we could have them sooner.

shinuito · January 1, 2024, 11:05am

Yeah I’m Elo there’s a modification you can do to introduce draws, though I feel like to do it “properly” you need some assumption on the probability of draws between players of a given rank, so that you know how much the change their ratings, and then adjust say the glicko formulas.

I’m sure they must do it in places like chesscom and lichess though, but I suppose the frequencies of draws in Go might be much less than chess at certain rating ranges, so maybe it could just be guessed empirically? Or I don’t know

There was an example here for Elo with draws

hexahedron · January 1, 2024, 4:55pm

I actually did some informal study on this in the past, playing with different mathematical Elo models as well as Bayesian inference with different draw models, and looking that their properties, and consulting a little with a friend who also has a bit of knowledge about statistical inference.

It turns out you can do fancy draw modeling where you explicitly try to predict the probability of a draw, but if you get the model wrong, you can get something that behaves poorly. In particular, unless the draw model is sufficiently true to reality, it’s actually quite hard to maintain the desirable property that if A and B play a large number of games and achieve an match score of X, that the rating difference between A and B converges to an rating difference that would predict an average match score of X in that many games. The reason for this is because if in achieving that match score of X, players A and B don’t in the process draw at the rate predicted by the model, the maximum a-posteriori Elo can actually be different. Specifically, it may be skewed towards the rating that better predicts that draw rate according to your (possibly faulty) model, rather than the rating that best predicts the empirical match score. And therefore if the draw model is not good, then you have a problem.

The above is stated from the perspective of bayesian-like rating modeling as in things like BayesElo, WHR, or KGS’s rating system, but I think it also applies to “gradient-step” based systems like naive Elo, or Glicko, etc, because the way you often derive the latter systems is generally you start with the more or less the same underlying kinds of parametrizable predictive models, and the only difference is that rather than doing a Bayesian update upon seeing a new game, you derive a one step gradient-like update for a single game between two players and that’s how you get your update rule for ratings (and/or ratings deviation or uncertainty), possibly with some caps or limits.

But… there’s a very, very simple way to get draws to work with a rating system that didn’t formerly handle draws and that entirely sidesteps the problem of draw modeling, and that guarantees the above desirable property (that over a large number of games, the rating difference will converge toward the difference that predicts the average match score instead of away from it or to something else). Don’t model draws as a separate kind of event in the rating system at all, simply treat a draw exactly as you would treat half a win and half a loss.

So for an update-based system, take the average of the ratings updates (and ratings deviation/uncertainty updates) you would perform if the game was a win and if the game was a loss, and apply that average as the update.

And for a Bayesian system, set log P(result | rating difference) = 0.5 * log(P(win | rating difference)) + 0.5 * log(P(lose | rating difference)) and continue as usual.

This is not merely some hack, it should usually be mathematically sound if the original rating system was. You can see this intuitively by considering the new drawless game where whenever the original game would be a draw, you mandate that the players flip a coin to decide the winner. If you apply your rating system that doesn’t handle draws to this new drawless game (which is also a well-defined competitive game), you essentially get in expectation the ratings system proposed to handle draws, except that because you know in that case that the win and loss were equally likely, you can average over the two outcomes to reduce noise instead of letting the law of large numbers do it for you.

The downside of doing it this way is that by not explicitly modeling draws and instead just treating them as the average of a win and a loss, the rating system won’t make inferences based on the draw rate to adjust the ratings of the players beyond just following the empirical match score. For example, perhaps depending on your draw model you would want to say that a 4-win-4-loss result leaves you different ratings uncertainty between two players than an 8-draw result. Or perhaps you would want the rating system to adjust both players’ ratings upward upon an 8-draw result but not a 4-win-4-loss result because your model says that draws are more likely when both players are very good than when both are weaker.

But that’s kind of the only downside, and maybe you don’t even want behaviors like that anyways (e.g. adjusting both players ratings’ to be high when they draw a lot opens up more avenues for ratings manipulation).

_KoBa · January 1, 2024, 5:06pm

EGF doesnt seem to have any troubles handling draws (some tourneys have no komi for handicap matches), it counts as half point for mcmahon scores and is calculated as “0.5 wins” on their rating system

Ratings are updated by: r’ = r + con * (Sa - Se) + bonus
r is the old EGD rating (GoR) of the player
r’ is the new EGD rating of the player
Sa is the actual game result (1.0 = win, 0.5 = jigo, 0.0 = loss)
Se is the expected game result as a winning probability (1.0 = 100%, 0.5 = 50%, 0.0 = 0%). See further below for its computation.
con is a factor that determines rating volatility (similar to K in regular Elo rating systems): con = ((3300 - r) / 200)^1.6
bonus (not found in regular Elo rating systems) is a term included to counter rating deflation: bonus = ln(1 + exp((2300 - rating) / 80)) / 5

Se is computed by the Bradley-Terry formula: Se = 1 / (1 + exp(β(r2) - β(r1)))
r1 is the EGD rating of the player
r2 is the EGD rating of the opponent
β is a mapping function for EGD ratings: β = -7 * ln(3300 - r)