Use KataGo for Scoring

yebellz · July 7, 2020, 8:48pm

When a game reaches the scoring phase, the system generates an initial estimate of which stones are dead and which are alive. However, it seems that the current algorithm sometimes makes mistakes, which occasionally leads to the players mistakenly accepting the wrong life/death status and messing up the outcome of their games.

Even worse, for games against bots, the scoring is completely automated, with no ability for the human player to correct scoring mistakes, and the only way to avoid scoring errors is to prolong the game by first capturing the bot’s dead stones. Some bots might not even be programmed to clean up and capture all of their opponent’s dead stones, which could lead to more scoring errors going against that bot. Further, if Japanese rules bot games are eventually added, having to play on to capture dead stones could lead to the score incorrectly changing.

Suggestions

Use the KataGo bot to estimate life/death status for scoring
Add some sort of prominent UI element (like a warning symbol or red text on the page) that appears if one player changes the life/death status away from what is judged by KataGo (to make score cheating more obvious)

shinuito · July 9, 2020, 7:58pm

Is the following an issue?

In principle when the current system does it’s initial territory estimation it’s in an intelligent way in that it kind of knows which stones are alive and dead according to the rules, seki can be tricky etc. (It does some Monte Carlo playouts or something does it?)

When katago does it’s territory count and saying which stones are live and dead it’s probably correct sure with some superhuman ability.

What if at the end of the game katago declares some corner a seki or dead etc but neither of the players have noticed, and katago is 100% certain, will both players want to go back to the game and play it out since they’ve gained some knowledge about the position?

Is the assumption with the current system that (unless you’re a beginner/fairly new) you’ll be able to score the game better than it, and it’s just being helpful by giving the rough breakdown of territory, alive and dead stones etc and you’ll fix it?

I’m mean I’m sure katago would be fine if it’s person vs bot, more so for person vs person.

Vsotvep · July 9, 2020, 8:12pm

I think the same can be said about the current system.

I think either there should be no scoring at all, and all territory + life / death is selected manually by the players, or there should be a scoring algorithm that doesn’t mess up the scoring or confuses the players / tricks the players into accepting the wrong score.

Samraku · July 9, 2020, 8:16pm

Isn’t what we have now basically that? Unless I’m mistaken, there is a delay after the opponent changes the board during which you can’t accidentally accept the score, which prevents tricking one into accepting the wrong score? Is it really that much to ask that the players be able to tell the life and death status of the groups in the game they just played? If they can’t tell that, then the game wasn’t finished yet.

Vsotvep · July 9, 2020, 8:20pm

If I discovered one thing in these two days of being a moderator, the answer to that question is “yes”

Fabrício · July 9, 2020, 8:53pm

Whatever you have saw, I disagree in have a scoring that doesn’t let the humans to have the final word. I support what @Samraku said. But if score cheating is a big issue, then I am favorable to an extra warning, as suggested.

Vsotvep · July 9, 2020, 8:55pm

The humans always have a final word (in human-human games). Using KataGo the only difference is that the score is divided initially by KataGo, instead of by OGS’ current score estimator. The players are free to change it before accepting the score.

shinuito · July 9, 2020, 9:31pm

It’s not really a requirement to be a master of life and death to play go though. It feels like an unintentional mistake in counting (mislabelling a group as alive/dead/seki) should be treated like any other mistake in the game (once it’s not score cheating - hence unintentional) no?

If both players don’t know that a pyramid four shapes life and death status changes with one move by either player and they just mark it alive, then passing to end the game was a mistake and you or katago would call it a mistake at any other point in the game I presume (assuming no bigger move elsewhere).

I don’t know if it’s fair to say the game is unfinished because the players don’t have the ability to win or avoid a loss when they need to, unless you would also call resigning early when you don’t see a way for you to win (but maybe katago sees a way) also an unfinished game?

I imagine in the case of disputes that’s where another player (moderator/tournament director) gets out the rules to settle a dispute. That or you just tell the players to play it out - which would probably work a good bit of the time in Chinese rules?

I mean what happens in the case where player A thinks a group is dead, player B disagrees that it’s dead, but player A can’t demonstrate the correct way to kill it?

I’m not sure what takes priority (since I’m not a mod) between scoring the game correctly according to the god of go, or scoring the game correctly to the ability of the players?

Samraku · July 9, 2020, 9:47pm

If both players agree (incorrectly) on that status, then ending the game was correct and it should be scored as such. Part of the game is knowing when you no longer need to continue playing. This is one major reason I prefer area scoring: Just play it out. If you think it’s dead but you can’t kill it when the game resumes, your loss; it’s not the AI’s responsibility to save you from that.

shinuito · July 9, 2020, 10:10pm

Yeah as much as I enjoy that territory feels easier/quicker to count, area scoring seems to be better in that it’s not losing points to just play out captures.

I suppose having katago tell you something is dead or a seki is different to you being able to demonstrate that it is dead or seki etc. However I still think some go problems are easier when it says in them white to kill/live make a seki, than when it just says, judge the status of the group yourself. In the case with katago it’s more or less doing the former of telling you there’s is a way to live/kill/seki, as I’ve (katago) seen it/read it out

yebellz · July 15, 2020, 3:35pm

There are two separate contexts to consider this suggestion within:

Human vs Bot games
Human vs Human games

They each have different circumstances that affect considerations, so I will discuss them separately.

Human vs Bot Games

Currently, these games are automatically scored by the system after both players pass. Both the human and the bot cannot provide any input to the life/death status of the stones. I understand that this is to prevent humans from abusing manual scoring against bots, and since the bots might not even be programmed or set up to suggest life and death status at the end of the game.

Since bot games are currently restricted to Chinese rules, the players should play out at least part of an encore to capture their opponent’s dead stones in order to avoid the system mistakenly believing that they are alive. However, often players choose not to play out the encore, maybe since they don’t know that they should, or they just find it boring to play a bunch of strategically unnecessary moves.

For these bot games, since the system already performs a fully automated life and death estimation, which occasionally causes issues, I think it is reasonable and uncontroversial to use a strong bot like KataGo instead to estimate life and death to score such games.

There is already an open GitHub issue suggesting better automated scoring for bot games:

Further, with the current system, there is some potential for humans to abuse the imperfections in the current automated scoring to get some unfair wins against bots that don’t clean up all of the dead stones. If a bot does not always capture all of the human’s dead stones, occasionally the scoring system will make errors in favor of the human, while the human can avoid such errors by cleaning up. The human player could even make various failed invasions and pointless throw-ins at the end of the game, just to give the scoring system more chances to make an error in their favor.

Human vs Human Games

Human vs human games are different since the players can and should provide input to mark dead stones for scoring. My suggestion is NOT to automate away the marking of dead stones, but rather just to use KataGo as an alternative tool for estimating life and death at the end of the game, instead of the current estimation provided by the “auto-score” button (which by the way only generates a suggestion and does not take away control from the humans). Basically, the score estimation by the bot would just be a suggestion for the life and death status, as a matter of convenience, which the humans must approve and could potentially change.

As similarly discussed earlier, I think the use of a strong bot like KataGo to assist in scoring in this manner ultimately comes down to a consideration between two priorities:

Competition purity, i.e., the desire to not have AI tools play any role in the outcome of the game.
User convenience, i.e., the desire to have some help marking dead stones and potentially detect and highlight cases of score cheating.

I think it is a valid stance to put the first item at a higher priority, but such a stance should also support removing any automated life/death suggestions, and also even remove the in-game score estimation feature. However, I think that user convenience is also a pressing concern, especially for helping beginners and potentially detecting cheaters.

@Vsotvep seemed to make the point above that the current system is an unhappy compromise that does not satisfy either priority:

Games are already potentially influenced by the life and death estimation at the end of the game (and players might choose to resume and change strategy as a result of it).
The current scoring estimation can mess up and confuse the players into accepting the wrong outcome.

I agree with their proposition that there should either be no life/death estimation at all (fully manual scoring) or the system should be improved to use a strong bot like KataGo. That way, it is a matter of choosing between the two priorities rather than falling short of satisfying either. Maybe the availability of this tool could even be a custom choice for each game (just like disabling analysis).

shinuito · July 15, 2020, 6:10pm

I think if maybe a trial was set up on the beta site it would be easier to be convinced.

For instance how might katago score a game like the following if asked, where both players passed (I was gonna try find a game from the main site, but searching the forums for recent posts seemed easier).

Katago will probably say correctly upper right is dead, but what would it say for instance about white stones on the left side? Would it auto-mark some of the stones dead, and only the stones it could save as alive?

When you know that a superhuman bot is telling you some stones are dead, you should probably at least consider a way to kill them, unless of course the game isn’t close, or maybe play another move to save them?

When the current score calculator marks stones incorrectly, you get told on the forums, don’t trust the estimator, and ask for help, and you’ll learn to judge these things yourself in the future.

Anyway, I’d say it’d be easier to judge once there was some visible examples. Does katago have an option to display the status of each group and give a score estimate? I thought I read something about it’s training that it can predict the status of each group https://arxiv.org/pdf/1902.10565.pdf

4.2 Game-specific Features
In addition to raw features indicating the stones on the board, the history, and the rules and komi
in effect, KataGo includes a few game-specific higher-level features in the input to its neural net,
similar to those in earlier work [4, 3, 12]. These features are liberties, komi parity, pass-alive regions,
and features indicating ladders (a particular kind of capture tactic). See Appendix A for details.

I mean I know it can estimate score, since we have that feature in the reviews as a toggle, but is that the one you’d want to display in the scoring phase?

shinuito · September 14, 2020, 11:48am

An example game posted in another thread https://online-go.com/game/26850052

Katago in this human vs bot game predicts white+13.2 which I’d say gets rounded to 13.5 in practice. Actually both accepted the game end as is and it ended up B+17.5.

I’m just wondering what would happen in this situation for example.

martin3141 · September 14, 2020, 12:01pm

Interesting. Is the explanation possibly that katago is not 100% sure if an invasion at c3 can live or not?

yebellz · September 14, 2020, 12:13pm

I think the analysis indicates that KataGo believes a 33 invasion should work. Hence, the score swings depending on whose turn it is, since Black should want to prevent it with another move, while White should invade.

shinuito · September 14, 2020, 12:24pm

Yeah that was my interpretation too. That and the conventional wisdom that it’s fairly tough to kill a 3-3 invasion unless you’re really strong nearby

martin3141 · September 14, 2020, 12:38pm

Right. Somehow I thought katago gave a smaller score difference between Black / White to move, but the difference is 29 points, suggesting that an invasion at c3 should live. My bad.

Jirogo36 · November 6, 2020, 1:22pm

Only partially relevant to the main topic but…

Vs Bot, how about an option to cancel scoring, after which one can manually capture the incorrectly-marked stones?

Top-middle white group is dead:

The playing bot also seemed to consider the group alive. I messaged the maintainer 2 days ago but haven’t received response—maybe it’s not uncommon for 1k bot.

Conrad_Melville · November 6, 2020, 2:46pm

Allowing players to adjust the scoring against bots would result in rampant score cheating, even beyond what we see versus humans, because the bots can’t defend themselves from score cheating or call a moderator.

All reports get addressed, but priorities based on urgency exist. Score cheating is highly urgent. Refusing to accept a score is also fairly urgent (in part because it is sometimes a precursor to score cheating). Escaping, stalling, and abusive chat are more urgent than bot misscores because the latter is a one-off, whereas an escaper, staller, or abuser will almost certainly continue these violations at least until warned.

yebellz · November 6, 2020, 2:52pm

I think what @Jirogo36 is suggesting is not the ability to adjust the auto-scoring directly at the end of a bot game, but rather instead, giving the human player the option to back out of scoring and resume the game in cases where the auto-scoring is getting something wrong.

I think, for a lot of players, it is not at all obvious that one needs to capture many dead groups/stones when playing against a bot in order to avoiding scoring problems.