Improving OGS' scoring system

Regenwasser · April 20, 2024, 12:47pm

I mean the obvious choice would be to automate the process using AI. All other servers seem to be doing that, the technology to do it is there, so I’m not sure why OGS is not just going with that.

If you, for some reason, really want the players to come to an agreement then “score change requests” should at least happen in turn. Like you could have one button “agree to the AI scoring” and one button “disagree and make a proposal”. I think in almost all cases people will just click on “agree to the AI scoring” and then everything is fine. You could then track people that keep clicking “disagree and make a proposal” and then see what they are doing and why. Also when a person clicks “disagree and make a proposal” that proposal then should be send to the other player who then needs to “confirm”, “decline” or “amend” it. This way players will not confirm anything by accident.

This is very true. I used to play on OGS via the “Just Go” client because of its nice graphics. That client will just auto confirm any proposal made by the opponent. This way you’re completely at the mercy of your opponent.

Samraku · April 20, 2024, 9:37pm

It’s a big problem when a group both players believed to be alive can actually be invaded to force a seki. The AI will find it and score accordingly, but this is not in the spirit of human vs human play, especially when potential game resumptions are factored in

GreenAsJade · April 20, 2024, 9:54pm

^^ This is it. It’s not even as subtle as “invade to force a seki”, lower ranked players can completely misread life and death.

So it goes to a fundamental skill: is it or is it not part of the game to assess the final position and agree on the score?

We also get requests to annul games when the players agreed to a score that the AI then shows is wrong. “Oh it was a mistake”. Yes: you made a mistake, so you lose. Why would the scoring phase different in this respect?

These are not in fact rhetorical questions. I know at least one sport where navigation used to be an integral skill, and now it simply isn’t: you follow the GPS instead.

We could declare similarly that scoring simply isn’t interesting, and the score is what the AI says it is…

Feijoa · April 20, 2024, 10:10pm

Usually in such cases it matters who goes first, and a good autoscore would consider both perspectives, choosing a conservative combination of the results.

I haven’t seen a case yet that the simple-minded autoscore v3 doesn’t handle in a reasonable way. Want to try to break it?

benjito · April 21, 2024, 1:33am

I haven’t been following the autoscoreV3 conversation… any plans to merge into OGS proper?

Feijoa · April 21, 2024, 7:12am

Well it’s really just a proof of concept and I’m kind of hoping someone will look at it and point out the obvious much simpler thing we could do instead. There are probably also some fundamentally bad assumptions baked in to it. But I’d be excited to help get something like that in, yeah!

benjito · April 21, 2024, 2:35pm

Well, seems like an improvement to the current system! I just tried it on one of my games (oops shouldn’t have picked a live one, mea culpa) - autoscore may have revealed a weakness, but V3 keeps it concealed

square_fuseki · April 24, 2024, 2:20pm

what it says about this game?
https://forums.online-go.com/t/ogs-paid-supporter-being-rude/51561/14
that github page doesn’t work for me for some reason

Feijoa · April 24, 2024, 3:57pm

If I keep reloading, sometimes AI does indeed have some doubt about the status:

But in any case, since there are no fully-alive white stones in that area, it resolves the doubt by calling the uncertain ones dead:

Groin · April 25, 2024, 7:47pm

I think it is. It’s like when you do failures in the game, you assume them, you don’t want an AI to take your seat and play its own game, not yours. So the result should be yours too.

To the final result i want the game to be played by my opponent and me. Only after the game is finished and score validated, i can accept, if i desire so, external comments from other players or AI.

If i am bit confused because i lack knowledge to end it and my opponent too, then i would even prefer the help of a human as of an AI, because the human will be aware of teaching, fairness and such. And will try in some way to let us finish the game in good spirit instead of an AI which will maybe show us unrelated things that we already agreed on with my opponent, although being wrong assessments.

Do you ask to annul a game because the AI show you made a mistake during the game?

Not at all. Take IGS or KGS for example, both players have to mark dead stones and that’s not the only ones to process like this.

Regenwasser · April 28, 2024, 12:27pm

I’ve never used IGS/KGS. The larger servers, Fox/Tygem, have auto scoring implemented.

The thing is that OGS neither uses manual scoring nor auto scoring. OGS shows players the auto score result, then lets players change that (initially correct) result and agree on it, just to then have a mod annul the game afterwards and send a warning message to one of the players, so that that player then opens a thread on the OGS Forum where he asks why this happened to him, with that thread then gaining more than 70 replies with the result that no one is going to change the current logic or accept that it is flawed.

Samraku · April 28, 2024, 12:35pm

It’s not necessarily initially correct, nor should it be

gennan · April 28, 2024, 1:01pm

I agree with @Regenwasser that OGS scoring system seems to purposefully stick to a system that is in a gray zone between manual scoring and auto scoring, where the system can give hints to the players (which it shouldn’t, according to OGS’s own rules), but the players also can’t rely on the correctness of those hints, to approximate some compliance to OGS’ rules (but this can be a source for confusion, especially for weaker players).

Uberdude · April 28, 2024, 1:23pm

And I believe this poor hybrid system is a key contributor to the confusion seen by many OGS players around scoring. Whilst it could be because KGS doesn’t have a forum so we can’t see all the confused KGS players, in my experience from helping people including beginners on KGS that community of players does not have the same confusion about scoring as on OGS. Making it the player’s clear responsibility to mark dead stones rather than a computer trying to help you and sometimes doing it right and sometimes wrong (and maybe I don’t even need to finish the game and close the territories because the computer will try to guess how the game would continue and score that) leads to better results.

benjito · April 28, 2024, 1:41pm

While I think assessing dead groups is a fundamental skill, I don’t mind leaving the scoring process to AI.

Here is my reasoning:

Scoring (OTB or online) is one of the hardest things for beginners to figure out. If we could remove this one point of friction, I think that could be a win for the community as a whole.
- Of course, people can always learn to count later. Worst case - it’s at their first tournament, and what better place to learn than an event with a lot of experts in attendance?!
UX for scoring is not an easy problem to solve.
- I started on DGS, which less “hybrid” than OGS, and it was still hard to understand without guidance IMO.
- At govariants.com, we skipped the problem altogether by forcing users to capture dead stones
It seems there exists an algorithm to avoid revealing unseen weaknesses (see @Feijoa’s v3)
L&D and score evaluation is still part of the game. Players will still practice the core skill during the game, just not at the very end.

gennan · April 28, 2024, 2:03pm

I’d be fine with either fully manual scoring (as implemented on KGS) or fully automatic scoring (as implemented on FlyOrDie and apparently Fox and Tygem).

About accommodating novices with automatic scoring: I don’t know if fully automatic scoring will actually help them. I suppose it will be convenient for them in the short term when they can at least finish their games without understanding anything about life & death and closing territories. But fully automatic scoring may also result in novices remaining longer in this stage, never actually learning how scoring works, or what the game is even about.
I think automatic scoring is a double-edged sword for novice education.

I see more clear benefits for (somewhat) stronger players, saving a few seconds of their time to score their games, but even more by thwarting score cheaters.

yebellz · April 28, 2024, 2:14pm

How about we flip a coin or roll dice? Let the RNG decide what scoring experience the user gets…

While we’re developing changes to the scoring procedure, why not test multiple options and collect data on their effectiveness?

gennan · April 28, 2024, 2:32pm

That technique is known as A/B testing (perhaps you already knew that). [A/B testing - Wikipedia]

How would you define this effectiveness criterium and measure it so that it allows a comparison between fully manual scoring and fully automatic scoring?

I think that the first challenge is to even create autoscoring algorithms that agree with correct manual scoring. I suppose OGS has more than enough games in its database to test autoscore algorithms by comparing their results with the results accepted by the players actually scoring the game (perhaps stronger than a certain level to ensure they scored their games correctly).
I suppose that would require a good test data set where misscored games have been weeded out. That may also be a challenge (or at least a lot of work) when the dataset is large (which I think it should be).

yebellz · April 28, 2024, 3:13pm

There are several angles in which we could collect data. Some could be fully automated, while others require some degree of subjective feedback:

Difference/discrepancy between KataGo score estimate vs the scoring outcome. This could be measured with both the numerical difference in the score, and the rate at which there is discrepancy in who should have won.
Rate of moderator reports regarding scoring issues.
Time spent in scoring mode. Maybe even number of user actions performed, rounds of stones be marked/changed/rejected.
User feedback via surveys and polls.
Endless forums discussion and debates about the various options/

To be able to account for other factors, we should also collect other features, like ruleset, board size, timing settings, ranks, etc.

I think that using KataGo is a useful reference to check the validity of scoring. It almost always gets endgame positions correct, and I think that it can even be leverage to weed out positions that are not quite finished. However, beyond just testing autoscoring algorithms of various forms, I think that the overall user interface design and scoring phase experience could be looked at.

gennan · April 28, 2024, 3:39pm

I think this would test more the correctness of autoscoring algorithms than online user experience. In my mind, correctness of autoscoring algorithms can (and should) already be determined offline with historical data, before A/B testing anything in production.
I’m sort of assuming that autoscore algorithms already give very good results before they are taken into production, so the number of reports on scoring issues should be very low, unless perhaps from beginners who are confused by scoring results of their unfinished games, but I suppose they would also be confused by manual scoring results of such games. I’m now wondering how unfinished games are handled on Tygem and Fox.
When comparing fully automatic scoring with fully manual scoring, wouldn’t time spent in scoring be almost 0 in automatic scoring? Or do you have some user involvement in mind even with fully automatic scoring?

I’d assume that fully automatic scoring does not involve much interface design, so I suppose you mean the user interface of manual scoring could be looked at? Do you want to compare user interfaces of different manual scoring systems?