Why would you want to mark dead stones if the autoscore can do it correctly all the time?
Until now, the only objection I heard to having all games decided by a near-perfect autoscore after 2 passes, is that players might have missed an open border and they would be deprived of a means to fix that by resumption.
I didn’t answer the poll because imho i think there should be 2 different process, one for beginners who still learn to close their boundaries (with an arbitrary limit let say 20k?) and one for other players.
Well there are technical limits for sure , it seems difficult to mimic the help a human could give. Like telling “I m not sure you closed everywhere” and pondering (in a pedagogic interest) when to give a bit more hints. The procedure need to balance between a reasonable involvement of the players and a lack of experience.
The sure but heavy way is to encourage the beginner to call for help by a human when the result looks without sense. With enough flexibility to not feel sanctioned by an AI.
I could imagine a button “pause and call for help” alternative for “confirm the SE”.
This would put the game on pause until some help coming. Only for 20k to 25k.
For higher levels, i’d be satisfied by a scoring procedure like kgs.
Providing such a button seems a question independent from the poll question which positional information the scoring system should show to the players during the marking phase.
I think you’d need a pretty smart AI to automate assistence to beginners in bite-sized chunks. I think that in the forseeable future, only human teachers are capable of doing that, and I suspect that there are not enough of those around all the time on OGS to assist all beginners who might click that button and expect to be helped within a couple of minutes.
Well 90% of OGS players could help beginners if OGS rethink the communication system between both world. (Like a new global alert system and then the possibility to interfere in their game if they agree and ask)
My estimate is that the current autoscore works 99% of the time assuming closed borders, and it should be easy to get it to 99.9% by doing anything sensible with both black to play and white to play.
But 99.99% seems tough. Don’t you get into weird scenarios at that point with Japanese rules, for example, that Katago doesn’t even understand? And when there are multiple invasions possible, a group might be surprisingly dead no matter who plays first, which would be hard to code for.
Even without going that far. I remember a game where both 6d players missread a life and death in a casual game just because it looked usual shape but in fact one can fail on something looking simple. Nothing so weird as the remote corners of the japanese rules.
Well, I don’t think a proper autoscore algorithm should work by assuming a resumption of the game. Unclosed border should result in a neutral area. That’s just the rules of the game. It would be incorrect scoring if an unclosed area scores points.
I think the only thing that an autoscore algorithm should do is decide which groups are dead.
From there I think it should just be floodfilling, except for eyes in seki under Japanese rules.
If it is impossible to create a highly reliable group status adjudicating algorithm, I’d vote for leaving all markings to the players.
Right, but I’m saying deciding which groups are dead perfectly 10,000 times in a row is really hard. There can be a coincidence of two very rare situations in that many games, like maybe bent four plus unremovable ko threats? Or just two unseen weaknesses.
I think KataGo is capable to adjudicate bent four plus unremovable ko threats, because that is just a situation where it doesn’t matter who gets to play first. If the players pass with that situation on the board, it should be scored as a seki.
I think the harder cases are games that aren’t properly finished, and the status of some group depends on who gets to play first. In that case the algorithm needs to be clever and do some topological boundary analysis in combination with consulting KataGo about group status.
But I think that is more or less what you created, right?
Did you test your algorithm on a large random sample of OGS games, compare its results with the actual game results and verify it did better than the players in most cases where it differs?
I don’t know how many games are played daily on OGS, but I assume it is some 10s of 1000s?
I suppose it would be undesirable to have dozens of daily reports from users having a valid complaint about the autoscore algorithm incorrectly adjudicating group status, if it cannot be corrected by the players.
That’s why I stated that 99.99% correctness requirement before moving to fully automatic scoring. I think 99.9% won’t be enough for that.