Testing autoscore algorithms

I agree. This new system sounds quite complicated. I appreciate the intent to innovate, but I’m not sure this would achieve the stated goal of reducing confused posts.

1 Like

Maybe you guys can at least think of it as a possible algorithm for scoring bot games, if that helps.

The advantage of letting some unsettled stones die should be not to reveal surprise invasions as often.

Can you find a game that breaks it?

I’m in the camp that favours bot games to be unranked for human opponents, unless the human opponent has a provisional rank.
Besides that, I think scoring bot games should work the same as scoring games between humans.
If bots are too weak to mark dead stones sensibly or to settle group status properly before scoring, I feel they are a potential detriment to educating their novice opponents.

I don’t quite understand this statement. Can you give an example?

1 Like

A reminder that most bots can mark dead stones during the scoring phase and the GTP protocol supports this. OGS has just historically not bothered to implement it. It would make much more sense for anoek to spend say 10 hours implementing that, than 20 hours implementing an autoscore feature as a workaround for not doing it, and us forum users spending 1000 hours talking about it whilst legions of beginners get misled about how to score from the current poor autoscore.

1 Like

Imagine the players play out a couple of moves of the surprise invasion before giving up and passing:

image

The single white stone and a bunch of black ones are considered alive or dead depending on whose turn it is. If we call the black ones alive and the white one dead, it settles the game without a weird huge dame.

In such a situation, I think it’s justified to pull out the adagium “garbage in, garbage out”. I’d prefer for an autoscore algorithm to declare the status of all those unsettled stones around M3 as alive. As a result, the whole lower right region would become a neutral area.

When players are not required to claim dead stones and group status determination is left to an autoscore algorithm that can’t read the players’ minds, any outcome of the algorithm is bound to be problematic in many cases.

The more problematic situations of unsettled stones I see in this thread, the less I think it’s worth the effort to try to figure out some obscure algorithm to come up with an arbitrary game result. I’m feeling increasingly uncomfortable in my role of Devil’s advocate for autoscore and becoming more in favour of not having an autoscore algorithm at all, always requiring players to claim dead stones before scoring. It’s not a huge effort for players (another benefit is keeping it simple for OGS and players).
If players fail to claim dead stones, all stones are alive and the board is scored accordingly (potentially having large neutral areas). Those are just the basic mechanics of the game.

Edit: If one is persistent that OGS must have an autoscore feature, I think autoscore should at least have the power to draw and annul games in case flipping the status of unsettled stones to dead/alive has a large enough effect on the score to change the winner. I feel it’s better than declaring a winner more or less randomly, and it’s not as harsh as “both players lose”.

1 Like

Is there a big problem with incorrectly scored bot games? As in, does it happen frequently?

Either way, I hope you don’t misunderstand me. I think you’ve done some excellent work @Feijoa , and I find the results interesting.

I don’t know if it’s common, but I think I’ve seen posts with problems - it must be hard to make a low-rank bot that still always secures its borders perfectly, right?

1 Like

I don’t feel that closing borders is really a problem. As long as stones are not left unsettled after passing (while players are not claiming dead stones), the game can be autoscored.

If you’re not satisfied with a Chinese-scoring bot that turns its entire area into stones and single-point eyes before passing, and instead want a bot that passes not too much later than a human would, then yes it can be tricky sometimes.

A good practical method is to outsource the problem to an existing bot that solves the problem. For example, for writing a bot for beginners, an easy method is “if GnuGo would pass, then pass, and echo GnuGo’s claims of dead stones, otherwise play your own bot’s favorite non-pass move”. I’ve done this sort of thing before and it works well.

5 Likes

I appreciated the feedback about the algorithm being arbitrary and that we should leave stones alive in more situations. Consider this example:

There’s an unsettled black stone at the top, and no matter how that gets resolved, the area behind it is dame due to the unclosed border below. The area happens to have more white than black stones on its border, but that does seem like a really arbitrary reason to call the stone dead. I think it’s better to call unsettled stones alive if they are in dame.

So, here’s my autoscore v3 proposal:

  • Chains of unsettled area (open/unsettled/dead) are considered as a whole.
  • Count the white/black stones on the boundary of each area, not counting stones touching only unsettled stones; consider “dead” stones that would be in dame to be unsettled.
  • If the whole boundary is one color, mark unsettled stones of the opposite color dead.
  • Otherwise leave all unsettled stones alive (the area will be dame).
Some code details
function run_algorithm() {
    compute_combined_ownership()
    find_regions()

    // use combined_ownership to compute a temporary resolution of regions
    // as black, white, or dame
    resolve_regions(combined_ownership)

    unsettle_dead_stones_in_dame() // sets combined_ownership2

    // do it again with the newly unsettled stones!
    resolve_regions(combined_ownership2)

    compute_algorithm_ownership() // uses combined_ownership2
}

You can see how v3 does on the above board position here. (Also, I added something so you can edit the board position without having to create an actual game for it.)

Some more examples (clickable)

image
Unsettled stones surrounded by white. (@Vsotvep I still want to know, did you intend to award that entire area to Black with your proposal?

image
KataGo considers the top black stone dead no matter who plays first. That would be a dead stone in dame, so we call it unsettled. But except for the one next to it, there are no other live white stones bordering the region, so we call that a Black region. Let’s see what happens if the White group is alive instead:

image
The unsettled stone at the top stays alive because it’s in dame.

image
The surprise seki is not revealed.

Of course it also works with the latest one from the other thread:
image

Now can anyone find a really surprising or arbitrary result with this v3 algorithm?

1 Like

This would likely require further testing to identify edge cases, but all in all I do think this new version is an improvement.

1 Like

If I understand correctly, you start from determining area status (open/unsettled/dead), and from there determine the status of stones in those regions?
To me this feels like the reverse of the standard scoring process.

In the standard scoring process, you first determine which stones are dead, remove those from the board, and from there you determine the status of all regions (black/white/neutral).

I don’t think that the concepts of areas with statuses open/unsettled/dead are defined in any rule set. So I feel you’re sort of inventing a new rule set here, that is distinct from rule sets already out there.

Putting that aside, I’m not sure what you mean exactly by an “open intersection”, a “boundary”, an “area” and “being in dame”.

  • Is an “open intersection” just the same as an empty intersection (i.e. not occupied by a stone)?
  • Is an “area” a contiguous region of empty intersections? First I thought it was, but then later on you seem to consider dead and unsettled stones as being inside an area (i.e not part of the boundary).
  • Is a “boundary” the set of stones of either color touching the area under scrutiny, regardless of their status (dead/alive/unsettled)? First I though it was, but then later on it doesn’t seem to include unsettled and dead stones. In that case, wouldn’t a boundary usually consist only of living stones of one color (attacking the unsettled group inside the area)? If that is true, most unsettled stones would be declared dead, right?
    I’m probably missing something here.
  • Does a boundary include all stones connected to the stones that directly touch an area, even when they don’t touch it themselves? If yes, how deeply? What counts as connected? Or is this up to an AI that determines which stones further away have the same status as the stones touching the area under scrutiny?
  • Does “being in dame” mean unsettled stones touching (or inside of) an area that has living stones of both colors in its boundary?
  • Chains of unsettled area (open/unsettled/dead) are considered as a whole.
  • Count the white/black stones on the boundary of each area, not counting stones touching only unsettled stones; consider “dead” stones that would be in dame to be unsettled.
  • If the whole boundary is one color, mark unsettled stones of the opposite color dead.
  • Otherwise leave all unsettled stones alive (the area will be dame).

I thought you’re counting the black/white stones in a boundary as a tie breaker to determine the status of unsettled stones, but you don’t mention this now (or perhaps I missed it). Is it still the case? If not, why count those stones?

My goal is to automatically determine which stones are dead from a couple of AI estimates that give you dead/live/unsettled. The idea that I took from Vsotvep (though it was different from what he had intended) was to resolve the unsettled ones with an estimate of the ownership of “unsettled area”:

(I think he meant to also include the dead stones in that definition.) I guess I prefer to say “region” since “area” already has a particular meaning in Go.

The boundary can only have live stones, since the region expands to include all others. Look at the eggplant region from one of the links above for an example:

image image

I said “not counting stones touching only unsettled stones”. I don’t know about “most”, but in the one above the unsettled group dies, since the whole eggplant boundary consists of black stones, specifically the the seven highlighted here:

image

(The white stone above is initially part of the boundary, but it gets promoted to unsettled as described. And yes, it’s just direct touching.)

But consider this situation (which actually had a bug that I fixed just now :flushed:):


image

Since we don’t count ones that only touch unsettled stones, there are NO live boundary stones at all. With no black or white stones to count, the region is considered dame and all unsettled stones live.

I call a region dame if it has both colors in its boundary or an empty boundary (like the rectangular six above).

After your earlier comments I decided that the tie-breaker idea was too arbitrary. So all that v3 cares about is whether the “count” is positive. We only assign a winner to the region if the count is positive for one color and zero for the other. Otherwise, everything lives.

I hope that clears it up a bit.

Now, is it like a different ruleset? Since I’m proposing to just use this for life/death and then follow the normal rules of go for doing the actual score counting, I think it gets us closer to the traditional rules than what we have now. And I hope that there aren’t any properly finished games that the algorithm will mark incorrectly. I do think that there are things about Internet Go that are just necessarily going to make it different than real life games, and I’d like to work on making it different in the right way.

2 Likes

First I thought it made sense to detect unsettled stones by having an AI showing a group status change depending on which plays first.
But in this case the AI apparently says black’s stone on the upper edge is dead regardless of who plays first. I suppose that is because the AI determines that if it’s black’s turn, black will kill white in the lower right first instead of connecting that black stone at the top, because killing white is bigger. So the status of that black stone at the top is not evaluated locally, but globally (under assumption of continued alternating play). For me this phenomenon more or less invalidates this AI method of detection of unsettled stones.
I would say that (locally) black’s stone is unsettled or even alive, not dead. The fact that black can kill a white group elsewhere on the board (globally, under assumption of continued alternating play) shouldn’t affect the status of that black stone at the top.

I’m somewhat surprised that a boundary can just end somewhere in the middle of the board. So are different regions not always fully separated by a boundary between them?


If this algorithm were used to score novice games having unsettled stones when scoring, I’m worried that explaining to novices who’d unavoidably ask on the OGS forums (or Reddit) “Why was my game scored like this?” would be quite difficult and hard to follow, much more difficult than explaining about life and death.

1 Like

I’m not quite following everything in this discussion, but I’m a big fan already of the introduction of fruit and vegetable emoji to help label territory and such.

3 Likes

I thought of this but have sort of changed my mind.

The question at hand isn’t so much “How to explain AI scoring algorythm to beginners”, but rather “How to reduce the situations where AI scoring causes beginners to be confused”, and those are not the same questions.

If we look at this for example, which is a real example where a beginner was confused and complained, @Feijoa’s proposal does solve it and said beginner wouldn’t have posted about it.

https://pdg137.github.io/autoscore/v3.html?game=53021339

This kind of simple situations seem to be handled well and would already remove many cases of confused beginners.

Now we would still have weird situation where a beginner might be confused, but then we can just say that this game was ended prematurely and that the computer had to make certain guesses as to what was the intent of the players, and that you should correct it manually in such case (exactly as we do now, so no worse than what we have).

There’s really no point in trying to explain exactly why it made such and such guesses instead of others (even under the current algo it’s often far from clear).

Less sure of that argument, but throwing it as well: the fact that the actual reasoning of the computer is harder to understand could even be seen as positive, because it makes it harder for the players to exploit it. A common issue now is that the scoring AI can reveal a specific weakness that you did not spot (like in the example above) ; in a way, having a weirder algorythm makes it harder to exaclty spot what it spotted.

4 Likes

As expected @yebellz has seen past all the trivial implementation details to identify the key kernel of innovation here. This is the food emoji function:

function region_emoji(r) {
    return String.fromCodePoint(r+0x1f344)
}

It seems to give a decent variety of colorful symbols for a range of small integers r.

2 Likes

I think a big part of reducing confused posts is eliminating huge surprise dame areas, and I was considering completely changing this to just be about minimizing dame. We could try all the life/death options and pick the one with fewest dame, for example.

Maybe that’s almost equivalent to what I have here or close, not sure.

I wonder how your algorithm v3 would score this unfinished position in the corner:

image
(Practice game: Dead, unsettled or alive? - #87 by ArsenLapin1)

1 Like