Poll: how would you score if you were the auto-score algorithm?

Vsotvep · September 15, 2021, 10:29am

Then:

Uberdude · September 15, 2021, 10:55am

I would prefer marking the dead stones in a human vs bot game. I am aware that some naughty humans (am I naive to think a small minority?) use this to score cheat, but I would attempt to solve that problem in a targeted way rather than perverting the scoring phase of human vs human games, which are presumably far more common. For human vs bot it could make sense to compare the human marked result vs some AI score and flag to moderators if vastly different. If that’s too much work for mods, then consider an auto-score, but ONLY for bot games, not for human vs human.

hexahedron · September 15, 2021, 12:38pm

I don’t entirely understand why autoscore should be needed even for bots. Many bots could also mark stones they believe are dead, if only OGS allowed them to do so.

See: Specification of the Go Text Protocol, version 2, draft 2 - although probably not all bots have actually implemented it, GTP has always mandated a command that bots must implement to indicate what they believe is dead/alive at the end of game.

Some bots probably also implement KGS’s kgs-genmove_cleanup extension. A bot that supports this command, when given this command should continue play until all dead stones have been removed from the board, and only pass when it believes that is done.

Excerpted below for KGS’s bot configuration options: (kgsGtp)


hint.noArguing	If your engine does not support the kgs-genmove_cleanup command (or the game uses Japanese rules, where kgs-genmove_cleanup can’t be used), and your opponent does not agree with the dead stones that the engine settled on, then this message will be written to your opponent. It should tell them that they must either agree with the engine or leave the game. The default message does exactly that, in English.
—	—
hint.cleanup	If kgs-genmove_cleanup is usable in this game, and your opponent disagrees on which stones are dead, then this message is sent to them. It should tell them to undo if they want to play out the game to determine which stones are dead. The default message says exactly that, in English.

So a bot can implement only final_status_list and demand the human player can’t argue, or they can be friendly and allow the opponent to argue and accept what the opponent wants (and thereby even play Japanese rules!). Or for Chinese rules it can implement kgs-genmove_cleanup as well and thereby allow playing out of disputes, and for bots that don’t have good status determination, they can even implement final_status_list as “all stones are alive” and let the opponent do all the marking, or refuse to argue and just play out all dead stone capturing. There is a convenient mechanism for the bot author to specify a chat message by which the bot can explain this. As an escape hatch, on KGS players may also just adjourn the game with no penalty if a no-argue bot that normally doesn’t misbehave does happen to score something wrong (since players are allowed a small buffer of adjournments before games will be auto-forfeited against them). It’s all up to the choice of the person who writes and/or runs the bot, and OGS can then be hands-off and direct responsibility towards the bot author.

A bot author could abuse this by making a bot always mark all opponent’s stones are dead and preventing arguing, but then you can just ban the bot until it’s fixed - it’s already the case that bot accounts require rubber-stamp approval by a mod. I doubt many people offering bots would try in the first place.

Uberdude · September 15, 2021, 12:42pm

I’ve not had problems with Fox’s autoscore either, but I am a sensible player who closes borders and doesn’t leave unsettled positions at the end of the game. Many in the OGS community seem to want an autoscore that doesn’t just deal with sensibly finished games, but prematurely finished ones with open borders, trivially unsettled positions (that even a 25k can spot, not just some clever tesuji only strong players like KataGo can find). I’ve no idea how Fox copes with these. I don’t think we should even attempt to autoscore these with a result that assumes continued sensible play. The players should continue until they can mark dead stones and territories and agree.

square_fuseki · September 15, 2021, 1:11pm

Uberdude · September 15, 2021, 5:27pm

Ask a silly question, get a silly answer.

dragon-devourer · September 15, 2021, 6:57pm

Good question

As has been discussed by @uberdude and others above, the answer here is that the bot needs to be able to mark dead stones. The dumb “auto-score” I have proposed is really just an auto-count. The score comes from the count which relies on life and death. The program does the count but the players are the ones who agree life and death so the program knows what to count. If one of the players is a bot, then the bot needs to agree life and death.

If the bot can’t do that, then the bot owner needs to step in. For example, on DGS bot FAQs page it says:

And on the profile of the DGS GnuGo bot:

Can’t be very often though as the bot has > 31,000 games.

Point is, if GnuGo can do it, then most bots can do it. And if human intervention is needed on rare occasions, then that is probably OK too.

teapoweredrobot · September 15, 2021, 7:03pm

But that’s not how it works on OGS AFAIK. We just have the auto scoring system that rules with an iron rod and accepts no help from anyone. Because previously the human could mark stones and not necessarily do it honestly because the bot couldn’t argue…

Vsotvep · September 15, 2021, 7:14pm

So, then we prefer amybot marking the stones and the counting be done by OGS over KataGo marking the stones and then the counting being done by OGS?

gennan · September 15, 2021, 7:40pm

If amybot doesn’t know to close borders and mark dead stones, then should amybot even play ranked games? I mean, it’s giving beginners a bad example of how to play and finish the game properly. Bots like that may be more of a liability than an asset for OGS.

yebellz · September 15, 2021, 8:34pm

I guess that’s part of a larger question of what we should do about misbehaving bots.

For example, what if someone set up a sandbagging bot? Such a thing could be programmed to play poorly and resign early in the majority of its games, but then play normally and strongly in the rest of its games.

Would it matter if such programming was purely an unintentional bug?

dragon-devourer · September 15, 2021, 8:54pm

That is taking commitment to cheating to a while new level!

If amybot plays the game, then yes.

I understand where you’re coming from gennan. On the one hand, you could say it’s no different from a beginner playing another beginner where scoring problems can arise and so just treat amybot like any other low ranked account. Except it’s not the same. The fact that it’s a bot conveys some sense of correctness (whether it should is a separate issues). So, as a minimum, I think bots need to be able to finish and score their own game properly, even if they might miss cuts that katago would see. This can be level appropriate, e.g. weak bots might thinks stuff is dead / alive when it’s not, in which case resume to prove it.

As @Uberdude said, most important is human human games as that is the majority

Uberdude · September 15, 2021, 9:05pm

This whole need for autoscore does seem like a case of the tail wagging the dog:

because OGS hasn’t implemented (?) the gtp protocol for bots marking dead stones
a minority of people would score cheat against bots, as the bot can’t disagree
a minority of games are with bots
so the majority of human vs human games get a messed up scoring system.

I bet anoek has spent longer on building autoscore and all its changes over time than it would take to let bots mark dead stones via gtp.

Groin · September 16, 2021, 1:24am

Poor beginners, they already suffer from the initial pairing system having to play as a 12 (or 6?)k the time the system adjust itself, now going into tricky endscoring…
We could have a different approach and more care with good pairing and more external human help (could use AI if workable?) for finishing these bundaries.

And we could replace the call for moderator button with call for help button or whatever more sweet words. Clocks freeze when pressing that button won’t hurt too. That call for help could call more people as only the moderators btw who can then keep good focus on the respect of the TOS.

Feijoa · September 16, 2021, 1:57am

In real life isn’t it acceptable to ask an experienced player to help you score a game? To me that’s what auto-score is.

teapoweredrobot · September 16, 2021, 5:11am

Sort of. A stronger player (I think/hope) would score the game, determine the winner and only then have a conversation like “of course of white had played here the result might have been different as these black stones are not completely safe…” Or some such.
Auto score seems to do it the other way round and hence affect the scoring itself in a way that goes beyond “helping score”.

Uberdude · September 16, 2021, 6:42am

In real life an experienced player would tell the beginners they haven’t finished the game yet (in the case of unfinished boundaries).

Feijoa · September 16, 2021, 7:17am

Okay, so it seems like independent scoring advice is a valuable feature, not just for bot games or for handling abandoned games. I don’t think we should give up on it so quickly.

While thinking about how @Vsotvep’s algorithm might fail I came up with this. Assume Japanese rules:

White [15k] just passed. What should Black [10k] do?

Play B2 or H2
Play B1 or H1
Pass
Resign

0 voters

If both players have passed, how do you score it?

Tell them to finish the game
Both of Black’s bottom groups are dead
Both of Black’s bottom groups are alive
One is alive, one is dead (or both are half-dead?)
Both players lose

0 voters

Groin · September 16, 2021, 7:27am

I suppose black can read that at 10k
(Answer 1)
If they agree then my answer. If not then they play it and one die one live

After the game scored and validated, you can discuss more if they want

I dunno but playing B1 vs B2 makes a difference for white not for black?

Feijoa · September 16, 2021, 7:34am

Why play and reveal the weakness, if you could have won by passing? At least with B1 there’s a better chance White will mess up by mirroring you.