Testing autoscore algorithms

There has been a lot of discussion of how the OGS autoscore behaves poorly especially by identifying invasions that the players had missed. The general feeling I get from experienced players here is that autoscore should go away and games should be scored manually, but I really don’t think that’s a good way to go for Internet Go and would cause a huge inflation in the number of scoring issues.

Instead, I’ve been a proponent of @Vsotvep’s algorithm for replacing the current autoscore:

I don’t think anyone has actually tested it yet, so I wrote a quick and extremely rough JavaScript implementation here:

https://pdg137.github.io/autoscore/?game=52240703

It runs the OGS AI score estimate on a given finished game, trying both Black and White to play, then goes through the steps of the algorithm and shows the final result. You can change the game number in the URL to try applying the algorithm to different games.

(And it’s really really rough since I don’t know what I’m doing; for example I’m using a third-party service to get around CORS issues with the OGS API and that seems to fail sometimes, also sometimes I still get CORS errors that I’m not expecting. And it is locked to AGA rules since I didn’t want to make an extra API call to determine the ruleset.)

Most importantly, the “Combined AI estimate” pane shows the main thing of distinguishing alive/dead/unsettled stones:

And “Algorithm result” resolves the unresolved regions based on which color contains the most unresolved stones:

image

Anyway, maybe this will be interesting to try against some of the poorly-handled cases. Please let me know what you think!

UPDATE:

It turns out this was not quite what Vsotvep had in mind, and anyway there are some weaknesses in this algorithm, so I’m now pushing a new “v3” version described here:

The basic idea is to assign ownership as black/white/dame based on the stones along the border of each region, rather than counting unsettled stones.

11 Likes

Here’s a game that’s interesting to test:

It got an unsettled stone at H9, but also a double atari at the bottom, so White can’t defend both at the same time, resulting in the algorithm currently marking part of the bottom to be Black regardless of whose move it is.

I’m not entirely sure how to interpret the graphs you generate, that is, what points does it mark as Black / White, and what points are marked as dame?

Here’s another very much unsettled one:

Or what about players leaving the borders open:

3 Likes

I’m using the https://ai.online-go.com/api/score (EDIT: now I’m using beta-ai.online-go.com which seems to more reliably allow cross-site scripts) endpoint, which returns positive or negative fractional values depending on whether the region is expected to be “owned” by black or white, with larger magnitudes representing higher certainty, I think. An “unsettled” stone is then one that might be dead (owned by the opposite color) or alive depending on who plays first. My only goal here is to settle the unsettled stones as either dead or alive, so I’m not trying to actually count the score or make any decisions about dame vs. territory.

I’m assuming that once alive/dead is determined, we could just count the score using the normal rules of go. That also means that this algorithm also isn’t going to do anything to resolve open borders; it seems fine to leave that to the players to resolve by resuming play.

By the way, I think the autoscore AI must be doing something a little different:

Since I’m just checking positive/negative “ownership”, I don’t ever end up with any seki points:

https://pdg137.github.io/autoscore/?game=45247279

Only marking the dead or alive ones doesn’t work, and is also not the algorithm I proposed (although I’m not certain my proposed algorithm would be able to solve the situation below either).

For example, in the first game I linked above, the stone at J1 should be counted as alive, and the surrounding empty area should be marked white. But, since this stone is dead under both Black playing first and White playing first (since KataGo would defend a different area), it gets marked as dead in your system:

I don’t understand how to get J1 to be alive from the algorithm you proposed:

If KataGo marks it dead in both scenarios, it’s “settled” as dead, right? What else could it be?

1 Like

Yes, I don’t think my algorithm works for this case either.

I’m still wondering why/whether you think what I’m doing differs from your proposal.

It is unfortunate that multiple weaknesses on opposite sides of the board can leave it in a bad state. But I don’t see how we can get away from that without getting beyond the AI estimate based approach. Perhaps we could identify weirdly dead stones like J1 (it is connected by open points to live white stones) and declare them unsettled too.

Well, the main purpose of my proposal is to mark the empty parts, not only the dead / alive stones.

1 Like

Aha, I was so focused on getting to something resembling normal Go scoring that I ignored the wording of your last three points. So really what I am implementing is this:

  • KataGo checks the position with White having next move
  • KataGo checks the position with Black having next move
  • Any discrepancy between these two is marked as “unsettled”, the rest is marked as “settled”.
  • Chains of unsettled area (open intersections + unsettled stones + dead stones) are considered as a whole.
  • If the number of unsettled Black stones is larger than the number of unsettled White stones, mark only the unsettled White stones dead
  • If the number of unsettled White stones is larger than the number of unsettled Black stones, mark only the unsettled Black stones dead
  • If the numbers are equal, leave the unsettled stones alive (it will be dame)

Here’s a position that I think might show the difference:

image

The black stones are unsettled, and there are no unsettled white stones, so you would mark that whole area as belonging to Black? I would call the stones alive and then following the normal rules of Go, the area becomes dame.

In action:

image
link

The current proposal uses unsettled stones to “vote” on which color wins the region. This is subject to strange results since unsettled stones are themselves a strange feature.

What if instead we looked at the color of (settled, live) stones on the boundary of the region? Then in the 7x7 example above, the five black stones would be dead, which is probably what the players expected.

The trouble with this approach is it places too much weight on future play by KataGo. Go scoring is, at its heart, a static topology and graph edge and region identification and colouring problem, not a dynamic playout problem. Which incidentally the human brain is really good at without a procedural algorithm because we do it all the time to parse the signal from our retina into an object view of the world. Playouts from KataGo can be used as a stone status solver but only as a secondary decision maker if the graph theory approach gives ambiguous regions.

5 Likes

I suppose this would work to autoscore games that are actually finished, which is usually the case when intermediate players or stronger go to scoring. There will rarely be unsettled stones encountered at step 3.

The problem is what to do when both (usually weaker) players want to score a game prematurely, i.e. the algorithm finds unsettled stones at step 3.
For such cases I think the only proper solution is to break off the algorithm and have the players mark dead stones by themselves (no hints given by the scoring system about which stones might be dead), hope both players agree on which stones are dead and if they do, score the game by flood-filling areas that don’t touch living stones of both colors (excluding seki areas under japanese rules).

Any algorithm that attempts to guess which unsettled stones both players implicitly agree on to be dead is bound to fail in many cases, by giving (illegal) hints to the players and/or guessing wrongly what players are thinking. To avoid any such ambiguities, IMO marking dead stones should be left fully to the players. And if the players can’t agree on which stones are dead, they should resume the game or call an arbiter to help out.

I’m aware that I’m repeating statements that have been made countless times before on the OGF, but if an autoscore algorithm will only work properly within a “happy flow” (the game is actually finished), I just can’t consider it to be sufficient.

To avoid confusing players by sometimes requiring them to mark dead stones manually and sometimes having an algorithm doing it for them automatically, I think it’s best to always require players to mark dead stones (without giving any hints about which stones might be dead). I’m aware that this would be sort of a breaking change for OGS, but I don’t see any way to “fix” autoscoring for both weaker players as well as stronger players.

3 Likes

That’s all that I’m trying to demonstrate here.

Do you mean something more than regions bordered by both white and black? That’s almost always going to happen in games, since there are usually some dead stones.

A region that touches both black and white stones is neutral, unless all stones of one color touching the region are dead.

So as long as the status of all stones touching a region is unambiguous (either 100% dead or 100% alive, as determined by an algorithm or by both players), the status of that region is unambiguous.

But when the status of some stones is unsettled (players disagree on those stones’ status), the region that touches those stones becomes ambiguous and the game can’t be scored.

Right, but Uberdude seemed to be suggesting that graph theory alone could score a game, without solving any stone’s statuses.

I understand his post as stating that once the status of all stones has been agreed upon unambiguously to be dead or alive (by the players or some algorithm or AI), scoring is a relatively trivial procedure involving only static topology (no AI needed).

1 Like

If that’s the case, I think he’s really arguing against the current way autoscore is applied, with playouts determining not just stone status but actual territory assignment, and I agree that that’s a mess. To follow the rules of Go, stone status should be determined first, then scoring should follow according to simple graph theory rules.

2 Likes

In OTB scoring, agreement on the status of stones is implicit by the players removing all dead stones from the board between passing and rearranging the board to count. So all regions are already topologically unambiguous when counting the score.
So in my view it makes perfect sense to require players to click dead stones in online games before going to the next step of counting the score.

1 Like

It’s a tradeoff between the two problems, isn’t it?

Of course if we had no score cheaters or players who stubbornly refuse to learn about special situations like seki, maybe the manual method would be great. But this is the Internet, and in my limited experience at least, player-scoring problems are 5-10x more common than the autoscore-hint problem. I’m hoping that we can get something that works at least a few times better than the current autoscore, making the issue of illegal hints almost entirely theoretical.

And even if we go with the manual method first, we still need that arbiter. Consider that we’ll never have sufficient moderation for all the scoring disputes, so if we’re really going to go this way, we still need some kind of automatic system for resolving the status of groups.

Or if you even have some solution for that (make everyone play with Chinese rules!) maybe you can consider this topic just as a theoretical curiosity.

I don’t think that playing with Chinese rules would solve the bigger issue. It only helps to settle status disagreements without moderator intervention.

I think the issue we’re discussing here is not so much status disagreement between the players, but both players agreeing to go to scoring while the status of some stones is unsettled and then trying to guess which status both players might be implicitly agreeing on (without requiring players to explicitly express their assessments).