I like the OGS software quite a lot generally speaking, but the scoring system is a major exception. It relies on the players to agree which stones are alive and which are dead, and this leads to a particularly annoying form of trolling in which a player stubbornly refuses to mark the stones correctly at the end of the game.
It seems to me that there could be some sort of automated solution to this issue built into the software. For example, Fox uses a system in which the players agree to allow the game to be automatically scored by AI, and this alone would be a big improvement. I don’t know if it’s possible to game such a system, but I haven’t run into any problems so far.
At the very minimum, some sort of system that would allow me to get away from the troll without losing the game would be valuable - even if I just go off and start a new game, the trolls just resume the game and let my timer run down. Perhaps the software could give me the option to pause the game pending moderation, with the threat of a ban or some other punishment if I abuse the system. I raise a flag for moderation in these games anyway, so it wouldn’t actually create new work for moderators.
This and the OP’s suggestion have been discussed before. The ideas are excellent, but they do have drawbacks.
An adjudication pause initiated by a player—which might be automatic when a score-cheating report is filed or might generate the report when an adjudication button is clicked—could invite trolling. However, trolls can file false reports now, so this may not significantly change the report load.
On the other hand, an automatic report and pause initiated by a set number of scoring alterations would certainly increase the report load enormously, because the large majority of score-cheating incidents are currently unreported. Of course, it would have the advantage of catching many cheaters who currently get away with cheating. The best exact number for the trigger is also problematic: if too low, it would increase the generation of unnecessary reports; if too high, it would not be triggered because the cheated player would have already left in disgust.
I prefer the pause initiated by a cheating report. BTW, this could also be applied to stalling by perpetual restarts.
Interesting. Is it known how widespread the problem is? Specifically, if most or all examples of score cheating were reported, would the current moderator team be able to handle it? If so, then perhaps a “pause game and report score cheating” button might be the correct solution.
Otherwise, I come back to some sort of automated scoring system. It could be triggered when both players pass as in the status quo, and then users would be given no choice but to accept the automated count or continue the game. There could still be a mechanism for flagging scoring errors, but I would guess that legitimate flags would be very rare.
I think at least two-thirds of score-cheating incidents are not reported, and the percentage is probably much larger. If all were reported, I believe it would overwhelm the report system.
A completely automated scoring system would be the ideal solution, but the AI has a number of scoring problems discussed in two other threads. In particular, an autoscore bug, in which closed territory is unscored, has existed since August of 2020. If that bug were resolved and a flagging button for a bad score were installed as you suggest, that would be a good solution, I think. In that case, the number of score-cheating incidents prevented would probably exceed the number of flags, thereby reducing the reporting load.
I’m not sure what OGS is using currently to autoscore, but the current AI analysis system is already producing fairly good score estimates. I’ve yet to run in any significant bugs in it. It would also allow scoring games that haven’t technically finished yet, such that can be often the case with beginner games. While also being able to adjudicate most obscure life and death states correctly.
I was thinking more along the lines of unenclosed territory, but yes the issue is that this might assume that some end game moves are played which neither player is seeing.
One solution could be to analyze both W to move and B to move and see so that the winner does not change, and if it does, refuse to count and resume the game.
I’d still see it as a viable prevention system for most cases of score cheating and dispute resolution.
Edit: This system is already practically in place through the new anti-stalling system.
Autoscore (the button at the end of the game) is just KataGo.
The score-estimator repo shinuito shared is the one that is used during the game when you click “Estimate Score”. IIRC it’s just a Monte Carlo algorithm, and as you can probably tell from using it, much less accurate than Kata.
Periodically buggy, thouth.
Once, I used the estimator because I thought I’m losing badly, in every corner, middle and side and was considering to end things. The estimator said "yikes, you lose really badly!', so I took the game to the stone removal phase without checking, assuming that the score would be more or less correct (yes, I am lazy, can’t count yet) and then I saw the game was resumed because the autoscore gave me a win and a quite good one! Luckily the other player was more careful.
Therefore the score estimator was more accurate than the final autoscore thingy.
Not even periodically… I would just say “purposefully bad”. The Monte Carlo search (at any reasonable depth) is terrible at L&D. The reason it’s used is because at some point anoek made it Kata and there was pushback from the community because of too much outside assistance during the game.
Edit: wait did I read this wrong? You’re saying Kata (Autoscore) did worse than Monte Carlo (in-game)? I guess anything is possible but I’m curious if the Autoscore just didn’t update correctly or if Kata actually gave a worse estimate!
Most likely under the assumption of perfect play you were winning. But if neither of the players sees that perfect play the autoscore can indeed give incorrect results - a similar case as the one pointed out by jlt before.
The only thing we can be sure about: if the scrore estimator shows something that is correct, it is more or less by coincidence.
I was just playing a game in which my opponent was playing a bunch of useless moves after all of the points had been settled, and I was given the option to force the game to end with the result “W+Server Decision”. This is great news! I’m not sure if the criteria for triggering it will mean that it can be used for score cheating, but at any rate it’s great progress.