Use KataGo for Scoring

If I discovered one thing in these two days of being a moderator, the answer to that question is “yes”


Whatever you have saw, I disagree in have a scoring that doesn’t let the humans to have the final word. I support what @Samraku said. But if score cheating is a big issue, then I am favorable to an extra warning, as suggested.

1 Like

The humans always have a final word (in human-human games). Using KataGo the only difference is that the score is divided initially by KataGo, instead of by OGS’ current score estimator. The players are free to change it before accepting the score.


It’s not really a requirement to be a master of life and death to play go though. It feels like an unintentional mistake in counting (mislabelling a group as alive/dead/seki) should be treated like any other mistake in the game (once it’s not score cheating - hence unintentional) no?

If both players don’t know that a pyramid four shapes life and death status changes with one move by either player and they just mark it alive, then passing to end the game was a mistake and you or katago would call it a mistake at any other point in the game I presume (assuming no bigger move elsewhere).

I don’t know if it’s fair to say the game is unfinished because the players don’t have the ability to win or avoid a loss when they need to, unless you would also call resigning early when you don’t see a way for you to win (but maybe katago sees a way) also an unfinished game?

I imagine in the case of disputes that’s where another player (moderator/tournament director) gets out the rules to settle a dispute. That or you just tell the players to play it out - which would probably work a good bit of the time in Chinese rules?

I mean what happens in the case where player A thinks a group is dead, player B disagrees that it’s dead, but player A can’t demonstrate the correct way to kill it?

I’m not sure what takes priority (since I’m not a mod) between scoring the game correctly according to the god of go, or scoring the game correctly to the ability of the players?

If both players agree (incorrectly) on that status, then ending the game was correct and it should be scored as such. Part of the game is knowing when you no longer need to continue playing. This is one major reason I prefer area scoring: Just play it out. If you think it’s dead but you can’t kill it when the game resumes, your loss; it’s not the AI’s responsibility to save you from that.


Yeah as much as I enjoy that territory feels easier/quicker to count, area scoring seems to be better in that it’s not losing points to just play out captures.

I suppose having katago tell you something is dead or a seki is different to you being able to demonstrate that it is dead or seki etc. However I still think some go problems are easier when it says in them white to kill/live make a seki, than when it just says, judge the status of the group yourself. In the case with katago it’s more or less doing the former of telling you there’s is a way to live/kill/seki, as I’ve (katago) seen it/read it out :slight_smile:

There are two separate contexts to consider this suggestion within:

  1. Human vs Bot games
  2. Human vs Human games

They each have different circumstances that affect considerations, so I will discuss them separately.

Human vs Bot Games

Currently, these games are automatically scored by the system after both players pass. Both the human and the bot cannot provide any input to the life/death status of the stones. I understand that this is to prevent humans from abusing manual scoring against bots, and since the bots might not even be programmed or set up to suggest life and death status at the end of the game.

Since bot games are currently restricted to Chinese rules, the players should play out at least part of an encore to capture their opponent’s dead stones in order to avoid the system mistakenly believing that they are alive. However, often players choose not to play out the encore, maybe since they don’t know that they should, or they just find it boring to play a bunch of strategically unnecessary moves.

For these bot games, since the system already performs a fully automated life and death estimation, which occasionally causes issues, I think it is reasonable and uncontroversial to use a strong bot like KataGo instead to estimate life and death to score such games.

There is already an open GitHub issue suggesting better automated scoring for bot games:

Further, with the current system, there is some potential for humans to abuse the imperfections in the current automated scoring to get some unfair wins against bots that don’t clean up all of the dead stones. If a bot does not always capture all of the human’s dead stones, occasionally the scoring system will make errors in favor of the human, while the human can avoid such errors by cleaning up. The human player could even make various failed invasions and pointless throw-ins at the end of the game, just to give the scoring system more chances to make an error in their favor.

Human vs Human Games

Human vs human games are different since the players can and should provide input to mark dead stones for scoring. My suggestion is NOT to automate away the marking of dead stones, but rather just to use KataGo as an alternative tool for estimating life and death at the end of the game, instead of the current estimation provided by the “auto-score” button (which by the way only generates a suggestion and does not take away control from the humans). Basically, the score estimation by the bot would just be a suggestion for the life and death status, as a matter of convenience, which the humans must approve and could potentially change.

As similarly discussed earlier, I think the use of a strong bot like KataGo to assist in scoring in this manner ultimately comes down to a consideration between two priorities:

  1. Competition purity, i.e., the desire to not have AI tools play any role in the outcome of the game.
  2. User convenience, i.e., the desire to have some help marking dead stones and potentially detect and highlight cases of score cheating.

I think it is a valid stance to put the first item at a higher priority, but such a stance should also support removing any automated life/death suggestions, and also even remove the in-game score estimation feature. However, I think that user convenience is also a pressing concern, especially for helping beginners and potentially detecting cheaters.

@Vsotvep seemed to make the point above that the current system is an unhappy compromise that does not satisfy either priority:

  1. Games are already potentially influenced by the life and death estimation at the end of the game (and players might choose to resume and change strategy as a result of it).
  2. The current scoring estimation can mess up and confuse the players into accepting the wrong outcome.

I agree with their proposition that there should either be no life/death estimation at all (fully manual scoring) or the system should be improved to use a strong bot like KataGo. That way, it is a matter of choosing between the two priorities rather than falling short of satisfying either. Maybe the availability of this tool could even be a custom choice for each game (just like disabling analysis).


I think if maybe a trial was set up on the beta site it would be easier to be convinced.

For instance how might katago score a game like the following if asked, where both players passed (I was gonna try find a game from the main site, but searching the forums for recent posts seemed easier).

Katago will probably say correctly upper right is dead, but what would it say for instance about white stones on the left side? Would it auto-mark some of the stones dead, and only the stones it could save as alive?

When you know that a superhuman bot is telling you some stones are dead, you should probably at least consider a way to kill them, unless of course the game isn’t close, or maybe play another move to save them?

When the current score calculator marks stones incorrectly, you get told on the forums, don’t trust the estimator, and ask for help, and you’ll learn to judge these things yourself in the future.

Anyway, I’d say it’d be easier to judge once there was some visible examples. Does katago have an option to display the status of each group and give a score estimate? I thought I read something about it’s training that it can predict the status of each group

4.2 Game-specific Features
In addition to raw features indicating the stones on the board, the history, and the rules and komi
in effect, KataGo includes a few game-specific higher-level features in the input to its neural net,
similar to those in earlier work [4, 3, 12]. These features are liberties, komi parity, pass-alive regions,
and features indicating ladders (a particular kind of capture tactic). See Appendix A for details.

I mean I know it can estimate score, since we have that feature in the reviews as a toggle, but is that the one you’d want to display in the scoring phase?


An example game posted in another thread

Katago in this human vs bot game predicts white+13.2 which I’d say gets rounded to 13.5 in practice. Actually both accepted the game end as is and it ended up B+17.5.

I’m just wondering what would happen in this situation for example.

1 Like

Interesting. Is the explanation possibly that katago is not 100% sure if an invasion at c3 can live or not?

I think the analysis indicates that KataGo believes a 33 invasion should work. Hence, the score swings depending on whose turn it is, since Black should want to prevent it with another move, while White should invade.

1 Like

Yeah that was my interpretation too. That and the conventional wisdom that it’s fairly tough to kill a 3-3 invasion unless you’re really strong nearby :slight_smile:


Right. Somehow I thought katago gave a smaller score difference between Black / White to move, but the difference is 29 points, suggesting that an invasion at c3 should live. My bad.


Only partially relevant to the main topic but…

Vs Bot, how about an option to cancel scoring, after which one can manually capture the incorrectly-marked stones?

Top-middle white group is dead:

The playing bot also seemed to consider the group alive. I messaged the maintainer 2 days ago but haven’t received response—maybe it’s not uncommon for 1k bot.

1 Like

Allowing players to adjust the scoring against bots would result in rampant score cheating, even beyond what we see versus humans, because the bots can’t defend themselves from score cheating or call a moderator.

All reports get addressed, but priorities based on urgency exist. Score cheating is highly urgent. Refusing to accept a score is also fairly urgent (in part because it is sometimes a precursor to score cheating). Escaping, stalling, and abusive chat are more urgent than bot misscores because the latter is a one-off, whereas an escaper, staller, or abuser will almost certainly continue these violations at least until warned.


I think what @Jirogo36 is suggesting is not the ability to adjust the auto-scoring directly at the end of a bot game, but rather instead, giving the human player the option to back out of scoring and resume the game in cases where the auto-scoring is getting something wrong.

I think, for a lot of players, it is not at all obvious that one needs to capture many dead groups/stones when playing against a bot in order to avoiding scoring problems.


I think this is a good solution, but KataGo scoring is on the way, which should also solve this problem.


Keep in mind that the score estimate and territory estimate are somewhat independent in KataGo, i.e. counting up the 99% certain white squares and 99% certain black squares does not always add up to the score. I’ve seen it wrong by 1.5 point in fairly normal situations before.


Just wanted to cross-reference this recent announcement into this thread as well (for people who might stumble upon this later).

Glad to hear that is happening! Thanks @anoek


@anoek @sanderl
Noticed this while browsing over this old thread:

Yes, it is definitely true that the score estimate and ownership estimate are a bit independent. You’ll need to be a bit careful and not trust KataGo at face value if you want accurate scoring. There are a few different ways to do it, you might need to play around with it. Here are some things to consider:

  • If you’re trying to use it for marking of live and dead stones, you probably want some thresholding on the confidence level of ownership on each spot, or perhaps the average confidence level within each connected chain of stones (in case there is “noise” and one stone in the corner of some group happens to be more or less confident than others). Ideally try testing various kinds of complex and messy/bizarre sekis (perhaps also double-ko-seki, and such) to find the right thresholds.

  • You probably also want to run it at least twice, once with each side to move. If the ownership prediction varies greatly with each different side to move, that’s a sign that the region is probably unsettled, and then your dead stone or scoring algo should do whatever you want to happen in that case.

  • You may want to implement your own logic to override what KataGo says in some cases. For example, if there is a subtle weakness in someone’s territory that allows cutting and killing a part of it, then such regions should show up as differing by player to move and therefore unsettled. But if both players have passed and there aren’t any dead stones or anything near the unsettled spots, and the region is clearly enclosed, then in some sense, both players by passing have “accepted” that the territory there is settled regardless of the weakness, so probably you’d still want to fully count the territory for the player for scoring purposes.

  • As an aside, KataGo’s own internal gtp “final scoring” function in Japanese rules (which just tries to directly find the score, rather than by doing anything with ownership or dead groups), uses binary-like search to find the komi such that the winrate is as close to 50-50 (“draw”) as possible, as determined by a low-playout analysis for each different komi. Although not perfect, this seems to be a bit more accurate than just querying the “lead” output directly.