[poll] Balancing the new AI score estimator

gennan · June 2, 2021, 5:52am

I think it is the new score estimator, but the review graph evaluation is fluctuating wildly between W+15 and B+15 around move 194, so I guess it might turn out somewhere in between if it tries to ignore whose turn it is.

I suppose the reason for this score swinging is that the status of black’s upper left group is left indeterminate by both players from move 170 until move 239. From move 170 until move 239 black can live with A15 and white can kill with A15. So A15 has a value of about 35 points. White kills it with A15 on move 239 and from then on, the evaluation graph stops swinging wildly.

Perhaps the score estimator uses an average between passing and playing to give a more static evaluation (independent from whose turn it is) than the more dynamic evaluation method used for the review graph (which takes into account whose turn it is)?

Vsotvep · June 2, 2021, 6:21am

It’s probably because the score estimation uses a deeper readout that the level 0 AI review does, thus the two may disagree. So, the score estimator is more correct, in this case, assuming you play like KataGo.

See also AI based score estimation and auto-scoring is now live - #98 by stone_defender, where I tested this hypothesis.

flovo · June 2, 2021, 6:28am

With a full strength review both scores are about the same.

StevenageTony · June 3, 2021, 12:42am

Gennan, that sounds like a plausible explanation, but surely the KataGo SE knows whose turn it is, so should also evaluate the score on the basis that whoever’s turn it is could play A15. (Yes, I had wrongly decided the group couldn’t be saved and was surprised when W played A15.)

gennan · June 3, 2021, 5:31am

Yes, KataGo would read that, but only if it is given sufficient playouts.

And programmers can tweak how they use KataGo and its output. Tooling built around KataGo can add some features that are not part of KataGo itself (see Katrain, Lizzie, etc). I thought that maybe @Anoek could be doing such a thing to improve the user experience of score estimation.

anoek · June 3, 2021, 1:30pm

ftr we don’t do anything fancy, we just hand katago the board state, whose turn it is, and present the results. Whenever I try to get fancy with these sorts of things, it usually doesn’t work out so well

Maharani · June 3, 2021, 5:54pm

StevenageTony · June 4, 2021, 12:48am

So is gennan @165 wrong in what he says about the sample game I put up @164 and if so, what is your explanation for the discrepancy? Is it to do with the depth of readout, as suggested by Vsotvep @166 (and flovo @167, I think)? I’m afraid my understanding of AI is very poor, so “ideas of one syllable” would be ideal to help me understand.

gennan · June 28, 2021, 7:44pm

@StevenageTony: For the record, my speculations in post #165 were incorrect.
@Vsotvep’s assumption in post #166 was correct and confirmed by @flovo in post #167.

HHG · June 29, 2021, 12:11am

From my understanding, IP’s are specific to your internet. If two people share the same network / provider, their IP’s may be in the same range (For example, if two people have the same ISP (say Optimum), their IP’s may be similar - ex. 188.45.137.67 vs. 188.45.185.28).
If two people share the same network / ISP, they may have a similar IP, but it will not be identical. However, if two people (or more) share the same internet (i.e. Same internet name, password, subscription plan, etc.), their IP’s will be the same, such as for members of a family.

Uberdude · June 29, 2021, 1:10am

Sometimes large organisations share the same single external IP address. Once a naughty player from the Seattle Microsoft campus got banned from KGS, and it banned the entire campus. Or another time on a go camp one of the pro teachers (MilanMilan 9d if you remember him) was playing on KGS, a student was kibitzing from another computer in the same apartment and got banned, banning MilanMilan too, to the disappointment of hundreds of spectators. So IP can be overly broad .

StevenageTony · June 29, 2021, 1:21am

Cheers, gennan.

HHG · June 29, 2021, 1:36am

Good point; my district’s IP address is shared, well, through the whole district. For example, when I go on Wikipedia while at school, sometimes I check the IP address’s Contributions page. Well…
Let’s just say that not all edits have been very useful.
I think, however, that this is because the district uses “the same internet”, with the same internet name, password (wait, do we even have a password?), payment, company, etc. Although the internet is spread throughout different buildings and routers, the IP address remains the same.

KAOSkonfused · July 1, 2021, 11:27am

Correct.
There are ways to distinguish users much further than only over the IP address, over the browser fingerprint.
But in this case I guess it’s fine to say, if you have the same IP address, you can’t access the score estimator, because there won’t be much of a loss and it’s an easy way to prevent cheating.

Groin · October 14, 2021, 3:36am

Coming back i find a bit disturbing that there is no difference in the button both are the same if you use the old SE or the new one. So one might think the SE is very strong ingame after using it in a review or reverse think that OGS SE is too dumb to use it in a review.

Maybe something like
estimate score (old)
estimate score (new)

I mean not everyone will come and read here 180 posts to know how the system works.

Lys · October 18, 2021, 6:19am

I must’ve missed something and I wouldn’t read back the thread to find it: are there two SE?
Where?

BHydden · October 18, 2021, 6:21am

The score estimator you see during a game / scoring is the same one we’ve had for years. Finished games / reviews use katago

Lys · October 18, 2021, 6:28am

So, I don’t get what button @Groin is talking about.

Oh, wait, maybe I got it: text is the same (estimate score) but environment is different: game vs review.
Is that right?

So we could change them to:

“take a dumb guess at score” in game page
“have a solid score evaluation” in review page

Groin · October 18, 2021, 6:49am

You got it

Feijoa · October 18, 2021, 7:03am

There’s also a different (even stronger?) one used for the score graph.