A compendium of OGS's terrible scoring system confusing beginners

Uberdude · July 30, 2021, 6:52am

I’ve participated in several threads here recently about scoring and how the current design is deficient and should be improved. Here’s yet another example of the problems caused.

Kosh · July 30, 2021, 7:17am

From my own observations, the current scoring system struggles with three issues.

Incomplete games. Close your borders to solve this problem. Use resume if need be.
Slight issue with eyes being counted in Japanese seki situations. This usually cancels out (one each) but when it doesn’t, manual adjustment of the score is probably the easiest fix.
Unsettled positions confuse the scoring system because it’s assessment is based on playouts. That’s the situation in the example above. Again, resume is the best remedy.

shinuito · July 30, 2021, 7:21am

There’s no resume when playing against bots though right?

Kosh · July 30, 2021, 7:22am

Fair point and people are often tricked into passing prematurely because the bot passed first. Such games can only be annulled (if the winner is incorrect). The score in the OP is wrong. Since Black didn’t invade, it’s White’s territory. No argument from me.

shinuito · July 30, 2021, 7:27am

I think what examples are showing is that having Katago score the game for beginners against bots has issues and since there’s no chance to fix it (because the bot can’t argue if you just mark all their stones as dead) you get games that are scored probably more incorrectly than if a non-intelligent estimator scored the game.

anon32248403 · July 30, 2021, 7:57am

I heard Fox does scoring really well. Any truth to that?
Just bringing this up, because someone on that reddit thread suggested KGS for beginners because they have better scoring.
If Fox does scoring better and has more players, I guess beginners should just play there.

Uberdude · July 30, 2021, 7:57am

Right. A super simple “players manually mark the territories” approach works better here than a clever bot barging in with insight that there are cuts which mean the lower right is unsettled: if both players agree to finish the game now then the borders are closed and it’s simply white territory and there should be no resumption. And a really basic algorithm from the 1960s to automatically identify territories based on being areas surrounded by only 1 colour would save them those 4 clicks as there are no dead stones in this game. In general there are dead stones so you need a cleverer algorithm from the 1990s (like on KGS) to identify those correctly and deal with life and death questions. Such algorithms aren’t perfect, but work better than the too-clever-for-its-own-good AI on OGS and then the scoring tool should allow players to manually mark dead stones / territories to correct it. Including bot games. You could then use AI to compare its result vs game result to bring possible score cheaters vs bots to the attention of the mods, but the auto-apply-AI-score-to-bots-game rule is demonstrably failing.

And that screenshot with some intersections in the lower right territory and some dame is total nonsense that violates the scoring rules of Go and should be obliterated with extreme prejudice of nuclear warfare.

marshmn · July 30, 2021, 7:59am

Wow, that escalated.

gennan · July 30, 2021, 8:08am

The players should not even mark their territories. They should only need to mark dead stones, if any.

teapoweredrobot · July 30, 2021, 8:11am

The final scoring system should at least obey the rules of Go!
I can understand a situation like this arising for a score estimator while a game is in progress (and would be a good example of SE giving too much information ) but for a final score this sort of display shouldn’t even arise.

Uberdude · July 30, 2021, 8:23am

Right, that a score estimator answers the question “What will be the likely score / territory status if players continue playing sensibly (for a strong estimator) / stupidly (for a weak one) from now?”, which is a very different question to “What is the score of this position right now?” is lost on many people that I consider score estimators quite possibly do more harm than good for beginners.

shinuito · July 30, 2021, 8:29am

I think this is the potential issue with human vs bot games, the bot like amybot can’t argue if the player suddenly decides that in fact all it’s stones are alive and some invasions that didn’t work should just be considered seki.

This probably could be the solution to that, but it might still flag a) a lot of games, and b) games like the one in the OP since katago potentially will disagree with the score, and the result might depend on who’s turn it is, who passed last.

But yeah the randomly marking points as dame inside a territory should be sorted v quick.

square_fuseki · July 30, 2021, 8:41am

There is difference between estimating and scoring. In scoring 100% sealed territory should be 100% painted even if insertion or cut is possible. By passing opponents say that they will not continue, so simulating future battle is just error by system.

square_fuseki · July 30, 2021, 8:54am

end

I see easy fix algorithm there:

If blue point connected to white points only, it should be painted as white. Repeat(black, white, black …). Then all blue points that are not near black, will become white. Then paint empty points that are connected to white from ALL sides.

gennan · July 30, 2021, 9:48am

And if there is a black group inside, all intersections connecting to both black and white should be marked as neutral, unless the black group is marked dead (agreed by both players).
It’s just a simple flood-fill algorithm which doesn’t require any intelligence by the program.
A simple tutorial with perhaps some practice puzzels should be sufficient to teach novices how to score a final position.

I don’t think that the algorithm to score a final position should go out of its way trying to accommodate players who are so new that they don’t know anything about life and death. There is just no way to fix that problem, other than teaching them the basics of life and death.

bugcat · July 30, 2021, 10:22am

I don’t think it should be taken for granted that scoring should even use an estimator.

It’s not a bad idea, imo, to have all stones marked alive at the beginning of scoring and to require all dead stones to be manually marked.

If one’s spent 30, 60, or 90 minutes playing the game then surely thirty seconds can be afforded to the scoring.

DVbS78rkR7NVe · July 30, 2021, 10:24am

Beginners are confused from the very beginning, it’s their natural state. Looking at it that way, scoring system helps them figure out scoring. A mistake here and there isn’t a big deal.

Vsotvep · July 30, 2021, 10:26am

We had a different scoring algorithm before that was not based on AI, and my experience is that we got a similar number of reports of incorrectly scored games, if not more. At least the current problems with scoring make sense to me, while the previous problems were just very random or weird. The main issue back then was seki being marked as dead, or dead stones being marked as alive. It’s very hard to write an algorithm that can decide life & death or seki without playouts.

Since the AI algorithm launched, the moderator team has been keeping track of incorrectly scored games in an internal thread, and I believe to have found an improvement to the current algorithm that solves 95% of the current mistakes it makes (which are mostly about territory with open borders or KataGo reading out L / D beyond the players’ abilities), but that’s not implemented (yet).

This is currently how the score can be marked by two players, but this has problems with a lot of beginners who try to score games with open territory (and don’t see the problem, thus don’t resume the game).

The AI scoring only kicks in when one of the two players requests it, when one of the players keeps refusing the score, or when a player is playing against a bot.

You’d be surprised how common it is that people leave immediately after passing.

Vsotvep · July 30, 2021, 10:40am

For those interested, here’s some snippets from that thread with my improvement in demonstration.

Here's an example of a group that looks alive, but can be killed by continuing play

Here's an example of open borders

On another topic

The players are trying to score before the territory is closed off. (screenshot, since I’ve advised the players to resume the game, so it may not be visible later)

image894×894 306 KB

In this case, my proposed algorithm would do the following. The difference between Black’s turn and White’s turn is only M13 and M14. Since these touch settled Black stones (L14) and settled white stones (L13), the auto-score would mark them as dame. As a result, unless one of the players notices the problem, the score would be the average between what would happen if White went first and if Black went first.

image900×900 284 KB

Note that under Japanese rules (if the winner depended on it), neither player would want to resume the game, since the opponent may move first and hence “win” the territory. Officially such position would declare both players to have lost the game.

Here's one with a huge unsettled group

Here's an example of how the algorithm could fail, but it can be improved a bit by allowing a margin of error

On another topic

image900×900 301 KB

In this one, it doesn’t work as well. Somehow, H9 and G8 are always marked as White, even though there is the H8 cut which should either kill H9 or the G8 stone. H8 is therefore the only “unsettled” point.

J1 and K1 switch colour, but J2 is always considered Black, which makes it a Black settled point, which is of course incorrect…

As a result, H8 is considered unsettled area without a majority of stones of either colour, thus marked as dame, and J1 and K1 are marked as White, since these do have a majority of White stones.

Proposal to fix this

Consider any intersection of which KataGo is very uncertain to be unsettled anyways. The Black / White squares that the estimator gives at these disputed intersections are so small that they could easily switch to Black or White just by random process in KataGo’s way of thinking.

E.g., the smaller squares in the following picture would qualify for being unsettled:

This would increase the number of unsettled stones, but that’s good. As a result:

image900×900 304 KB

Here both the J1 unsettled group and the G8 unsettled group would qualify as White territory, since it they both have more White stones than Black stones inside the unsettled area.

Finally let's also include a failure, although I would like to defend the AI here, since this seems impossible to score automatically without resuming the game

Vsotvep · July 30, 2021, 10:56am

I’ve thought of a solution for bot games being scored incorrectly as well: how about we just disable the human player from changing the score, but still activate the scoring phase? They have only two options: accept the score as is, or continue play to fix the problem in scoring.