Current autoscore failures (2025)

Groin · August 4, 2025, 6:04pm

You underestimate the black stone (like the players did)
I see that autoscore V3 use katago. And make it check “if black play next” I would be curious about the deepness of it’s analysis here to determine that it’s white territory

Feijoa · August 4, 2025, 6:14pm

My thoughts are irrelevant; all that matters is what the actual players thought.

I don’t think it’s helpful to judge the OGS algorithm against how we imagine a hypothetical algorithm might behave. The goal here really is to guess the players’ thoughts as closely as possible in as many situations as possible.

Also, when it fails, we should ask whether the problem is easy for the players to correct. In this case it’s not easy since it has revealed some hidden information and there’s no way to take that back.

Groin · August 4, 2025, 6:23pm

I completely agree on the way we want to get the opinion of the players even if it’s wrong or not so easy if you look a bit deeper.

So here I would translate it as see in 1 move white annihilate the stone and if it’s black turn it’s complicated but we don’t care because black doesn’t want to try it. Ok. Put it in the autoscore.
(the first point would make the proof that it’s white territory)

siimphh · August 4, 2025, 10:41pm

I very much agree with your approach, @Feijoa. If both players passed then they’ve demonstrated they think they couldn’t kill or save any groups, even with the first move advantage. So any groups that can be saved with first move advantage or that can be killed with first move advantage must be unavoidably in that state. Ideally, I think you should also only reason locally to make this determination?

If life/death status depends on first move advantage, then the group is truly unsettled. And that doesn’t seem to be the case for either of these examples.

Groin · August 4, 2025, 10:55pm

I m not saying there is no refutation. I’m saying that it’s not obvious for an autoscore to score like we would like it to score ( let’s consider that black stone a prisoner in a white territory)

siimphh · August 5, 2025, 8:05am

Just to be super clear, I think what we’re talking about here is scoring games once both players have passed, acknowledging that they have no more useful moves to make. So not scoring any kind of a board position, but only the final position on board once the game has ended. Maybe this was already super obvious, but this is what gives us the license to make the assumptions about life and death that we otherwise couldn’t make, so it’s an important assumption to state.

But with that assumption, when both players have passed, indicating they don’t have any useful moves to make, then that means all groups must be obviously dead, obviously live or obviously seki. If there is a group where the status is not obvious then the players must have made a mistake and if they made a mistake, then we can’t use logical reasoning to figure out what their intention was. And of course, what’s obvious to pro players isn’t obvious to you or me, so this is not a perfect assumption.

But that assumption gives us a really nice simplified way of deciding life/death that only deviates from the human intentions when at least one of the humans made a mistake. We need to check if the group can become live/dead while giving both players a chance to make the first move.

black \ white	alive	dead	complicated
alive	alive	unsettled	alive
dead	unsettled	dead	dead
complicated	alive	dead	unsettled

If we get a conflicting result or we’re unable to make a determination either direction, then we really don’t know either way and might just as well pronounce the group unsettled - this would happen if the humans made a mistake in passing or the humans were better in seeing life/death than our algorithm (but if the algorithm is katago then we are pretty sure it was a human mistake).

If either player playing first gets a straightforward result, then the players must have seen that result and accepted it, otherwise one of them would have played out the complicated result when it was their turn to move (but they instead chose to pass). Or, the humans made a mistake and misjudged the situation.

For the conflicting life/death case, we could probably make some kind of comparison, of how many moves we have to read out and if one solution is easier to see than the other, then we could assume that’s the one the humans saw and assumed.

And for the cases where the human made a mistake, they are not going to be disappointed with the autoscorer then producing a different result from what they intended. They can hardly expect a program to guess the specific way that they were wrong and propose that as the scoring result, can they.

It’s the type of a move sometimes pulled in computer science, to make it simpler to handle potentially difficult corner cases, to great practical effects. Eg, check out the “benefits” section for Undefined behavior - Wikipedia. In this case humans mistakenly passing when they should have played out a situation is the “undefined behavior”. Just to point out that it can be a good idea to allow yourselves assumptions about correctness, of course only if they align very well with what happens in practice and with other assumptions/rules you have.

Groin · August 5, 2025, 10:49am

The autoscore must give the answer of the players or they will be disappointed. That’s all the problem here. Or we ask players to manually mark the status of the groups if this is not possible. Or we warn that if they chose auto then the result may not be what they expect.

genbeart · August 6, 2025, 2:22am

Of course the concept of udefined behavior has lots of critics, including Linus Torvalds

square_defender · August 21, 2025, 3:29am

unusual bug:

everything is painted correctly, but score is not correct

jlt · August 21, 2025, 7:37am

I reported the game to see what happened.

The autoscore didn’t start at all.

Feijoa · August 21, 2025, 3:59pm

Yes, this is not the kind of failure I’m trying to focus on in this thread. It seems like the servers have just been misbehaving recently.

It’s interesting and confusing, though, that there are apparently separate autoscore runs used for the score calculation and the marking of dead stones!

_KoBa · August 21, 2025, 4:05pm

Yeah, a ton of problems today and yesterday. I hope it will get fixed soon!

Conrad_Melville · August 21, 2025, 4:14pm

This looks like another case of the mysterious Autoscore Bug, dating back to August 2020. I don’t understand why this situation is constantly described as “the autoscore didn’t run.” It did run, it just didn’t finish the job. If it hadn’t run, there would be no markings at all on the board. Moreover, since this is a game versus a bot, the human certainly didn’t mark the board. The autoscore marked it, and died before it finished.

Feijoa · August 22, 2025, 3:08pm

Well it’s mostly just me describing it like that. My assumption is that it works like this:

autoscore marks dead stones (and teire points)
score gets counted based on the markings

So when I see a situation where nothing is marked dead, and the score is counted consistently with that, I say it looks like “autoscore didn’t run”. It’s as if step 1 didn’t happen. In normal games I understand that step 1 relies on the players’ browsers sending a request to a server and getting a response so I can kind of imagine that the request fails sometimes.

Now we are seeing bot games where on one hand, the board shows the normal dead markings, but the score got counted as if step 1 never happened. That’s weird.

Are 1 and 2 somehow running out of sync? Or is there a separate copy of 1 + 2 that happens behind the scenes on a server somewhere and that’s failing, while we see markings from a completely different run of step 1?

I don’t think we can figure out much from the outside; getting this kind of thing working well has to do with arcane details of server management. There are all kinds of ways that it could fail that do not have anything to do with the game of Go.

My interest in this thread the step 1 algorithm itself, seeing how it works when it does actually run and mark some stones dead.

Feijoa · September 9, 2025, 7:11am

I added this new kind of autoscore failure, forked from a real game. Is it debatable?

jlt · September 9, 2025, 7:20am

The question in general is: how should the autoscore react if both players missed a move? For instance suppose both players passed after this:

Which stones should be marked dead or alive by the Autoscore?

martin3141 · September 9, 2025, 7:51am

Yes, let me try to give a counter argument.

I think we can safely assume that both players think the position is settled. But you assert more, namely that both players think that white is dead. Why can we rule out that one or both players think that white is alive?

Groin · September 9, 2025, 8:59am

Annul the game automatically without any scoring? Only solution I see.

Feijoa · September 9, 2025, 1:33pm

There’s a group in atari. That seems like it should almost always be dead, though I’m sure there are unusual counterexamples.

Groin · September 9, 2025, 1:35pm

Game is not finished clearly. A Atari is far to be enough to draw any conclusion