The integrated AI Review feature for OGS

anoek · June 7, 2019, 10:31pm

Not quite sure what you’re looking for Animiral, I can say that the service provided is the service advertised, but beyond that, you’re better at this game than I am. I took the liberty of re-analyzing that game with the 40x256 network in case you find that interesting and/or useful.

Animiral · June 7, 2019, 11:28pm

Thanks for taking a look
The new graph looks almost, but not exactly, like the previous one.

(new one above in blue, old one slightly lower in red)

The very interesting thing is that both runs show the same weakness, especially visible in the “problem area” around move 81. The drop shows way too late.

Since I already described the symptoms and reproduction steps in depth, I can only re-iterate. Move 81 is a severe blunder that should be highlighted by the analysis, but isn’t.

We can see that the implementation is broken because

it severely misjudges the win rate
the results vary wildly from my Leela
much too frequent/high positive percentage changes

If you still insist after all the evidence, I must give up here.
Fellow users of OGS, do you see what I see? Please tell me I’m not crazy

mark5000 · June 7, 2019, 11:45pm

I’ve wondered about this myself. LZ is well-known to be superhuman in strength and can beat high dans, including original Leela, on just 1 playout (excluding ladder issues). Its value network has been honed on millions of self-play games and, in my experience, stays more or less constant when following expected paths. Breaks from the expected path are rarely if ever better than LZ’s move. Consequently, any adjustment in win rate is almost always down. That’s how good it is.

THEREFORE, it’s strange that, following Animiral vs. vegmandu starting at, say, move 24, LZ is constantly swinging the win rate in favor of the player who just played, by margins greater than 3%, suggesting both players routinely shocked it with good plays. It really does seem in the nature of a bug. The LZ on my local machine casts much more shade on my moves…

Eugene · June 8, 2019, 2:28am

@mark5000 @Animiral I’m keen to understand (for my own benefit) and maybe even help, but regrettably I appear to be thick in understanding the problem you are describing.

The good part of this is that if you can describe it clearly enough for someone dumb like me to understand, then certainly someone smart like anoek will be able to tackle it.

Could you clarify if the problem the analysis that LZ itself is providing (which OGS is making available) or whether OGS appears to be introducing a problem?

When I run LZ under Lizzie on this game I get the same results that OGS displays.

If we take one concrete place in the game:

Move 26 and Move 27.

On Move 26 white places Q4.
OGS tells us that LZ tells us that the current win rate for black is 39.7%, and that black’s next turn will improve black’s position by 1pp.
On Move 27 black places R3.
OGS tells us that LZ tells us that the win rate for black did improve, and is now 40.7%. OGS also tells us that white’s next turn will make things worse for black, by 3.7pp

I think you are saying that this is not reasonable, but I don’t see why not.

Unfortunately people who can code can’t necessarily play or understand Go that well, so people who can understand Go really well have to explain it to people who can code slowly and clearly so that people who can code can make it happen

Deep_Scholar · June 8, 2019, 3:52am

I was wondering when the program reviews the game does it start from the beginning or end? I typically review from the end first as I do that for my own research projects with backpropagation.

So I was wondering how much the graph would differ if you did the reverse?

Eugene · June 8, 2019, 3:53am

It doesn’t matter whether it goes forwards or backwards - each position is evaluated on its merits (as I understand it).

I think this is true because I can go to any position and do “start analysis” (in Lizzie/LZ) and it comes up with the same result for that position, irrespective of what has been analysed up till now.

BHydden · June 8, 2019, 4:33am

To my understanding (and I could be wrong), they are saying the problem is that at one point the AI thinks black is winning, and then at another, the AI thinks white is winning, as during this change, there is a steep drop off, implying a blunder. I THINK the problem they’re reporting is that black played what the computer regarded as “best moves” through this “blunder period” and thus, the REAL blunder must have happened earlier, and for some reason the computer is either not picking it up or not displaying it.

Deep_Scholar · June 8, 2019, 4:45am

going backwards from about move 180 or so I didn’t get the crazy scores the AI got in the game position at move 82

(if im reading the game scores correctly)

tested again to make sure and added the earlier moves too same result

Eugene · June 8, 2019, 7:02am

… if what you are saying is correct, then the criticism is of “Use of AI for analysis” rather than “OGS implementation of AI analysis”.

^^ This is what I have been trying to flush out: where exactly is the problem?

martin3141 · June 8, 2019, 8:31am

As I understand it, in each board position Leela is taking a bunch of moves into consideration. Among those, it estimates the winning probability. After this, the winning probability of the original position would be estimated by the best probability among those moves (best for the player who is on the play).

Now consider a position where it is Blacks move, and Leela tells us that after the move played, Blacks winning percentage increases by a large amount. By my understanding, this would imply that Leela didn’t consider this move (or didn’t see how strong it is). I can certainly see this happening sometimes, but it baffles me how often this happens in the analysis given by ogs.

Eugene · June 8, 2019, 8:35am

This is the part I am trying to understand. Why did you say “by OGS”?

Are you saying that OGS’s delivery of the analysis is flawed?

If things are working properly, OGS simply renders the analysis of Leela.

martin3141 · June 8, 2019, 8:55am

I don’t know, I was just trying to convey why I’m confused and suspicious of the analysis given.

Animiral · June 8, 2019, 9:19am

Really? Could you please check again and make sure? It’s very different for me. I’ve reproduced it many times, I’m using the same network as OGS and I posted a screenshot.

Also check out @Deep_Scholar’s move 82 Lizzie screenshot, with quality LZ output (black 13%). This is the proper result.

NO! The criticism is entirely about the OGS implementation.

I meant every word as written. OGS advertises Leela 9006c708 with 1600 visits, but the actual OGS review is much weaker. Anyone can verify this with a little effort.

This is a misunderstanding. The fact that move 81 is a blunder is prior knowledge. I don’t blame you if it’s not obvious; I failed to read it during the game But in the game, white captures a large group. If black plays correctly, the group survives.

Leela should see this, and it does! But not on OGS.

Indeed. But that is not what’s happening as far as I can see.

Eugene · June 8, 2019, 9:26am

Fantastic. With this clarity, let’s focus on that.

I’m going to make sure I have the same network, then look at the same moves and see if we are seeing the same thing.

Eugene · June 8, 2019, 10:28am

By focussing on Move 80 I found the problem you are describing.

@anoek I can confirm that running LZ with the stated network gives a -54% swing for black on Move 80, M6 (as shown in the referred to pictures from @Deep_Scholar).

However, the OGS graph shows black’s Move 80, M6, as +9.4.

Because the shape of the graph OGS gives is “basically” right, it appears there’s some “showing the wrong move” issue going on. IE it looks like the %-to-win data has got out of sync.

(If this is the case, we should be able to find where in the game that happened … it’s a little hard because at the beginning of the game the %-to-win of each move is similar, but … I will see what I can see there, maybe others will too)

EuG

anoek · June 8, 2019, 3:30pm

Alrighty I think I’m understanding things a little better now, sorry Animiral, been a busy week I’ll look into it more in the coming days to figure out why there’s the apparent shift there.

I’m not exactly sure what I’m looking at at lizzie, when I go from move 80 to 81, there’s a huge point drop of the graph in lizzie, yet the bar graphs up top don’t change until the next move.

Eugene · June 8, 2019, 4:03pm

I think that Lizzie has its own problem in this respect, confounding our analysis of this bug.

As you say, the bar charts in Lizzie are for the turn prior. Seems like a bug to me.

Bernhard · June 9, 2019, 3:23pm

Not sure if this has been adressed already, just leaving two quick suggestions here:

If the game times out at move 0, before anybody plays a move, an AI review is really not necessary, and should be disabled by default
If 2 supporters are playing each other, 2 separate reviews are queued, which seems excessive. Only one (the better one) should be done.

square_fuseki · June 9, 2019, 8:07pm

I have suggestion:
after game is finished, players should instantly see who Leela Zero thinks is won. On timeouts especially. Sometimes people assume they won just because opponent mistakenly resign.

Just % of 1 last move with no move suggestions for both supporters and not-supporters before full review.

Eugene · June 9, 2019, 11:36pm

Is this suggestion like “finally, we can quickly get an AI to tell the players how the scoring should go?”

I wonder if LZ can be fired up quickly enough to contribute to the scoring phase with this judgement. It’d be awesome.

Or did you mean after the scoring - as an indication whether a resignation was a “good call”. Even that would be useful I agree.