How does LZ decide which move to play?

flovo · May 18, 2019, 5:41am

As I understand it, LZ chooses the move it explored most. But many comments on this forum do distinguish between the move which is explored most and the move LZ would play, implying they can be different.

My main sources for LZs move selection algorithm are the AlphaGo paper

Mastering the game of Go with deep neural networks and tree search

and a visuslization of it

LZ should work the same way as AlphaGO since the LZ github page states:

For all intents and purposes, it is an open source AlphaGo Zero.

Missed I something?

DVbS78rkR7NVe · May 18, 2019, 6:33am

I’m pretty sure LZ plays most explored move.

Lys · May 18, 2019, 6:51am

Our doubts are generated by the OGS implementation.
Letters A through F are on OGS.
What LZ does in other implementations can be used as comparison.

Vsotvep · May 18, 2019, 6:54am

It should play the most explored move, and generally this will also be the move with the highest winrate estimation. If a node has a higher winrate, but is explored less, then it will have priority over the other node anyway.

What can happen is that it finds out only relatively late that a move actually is bad (e.g. because of a ladder), and in that case it first needs to do more playouts in this bad note to get the score down before it starts considering playing less visited nodes. Hence after the score has lowered, it could be that there are nodes that are less explored, but got a higher score.

The thing is, those less visited nodes have not been read out as deeply, so they could be equally bad or even worse than the original high scoring node. It has more certainty that it’s evaluation of the most explored node is accurate. On top of that, the much explored node must have been better than the less explored node if you only read as far as the less explored node had been read out at the time the much explored node still looked good.

hqrpie · May 18, 2019, 7:12pm

I do not know anything about the subject, but just for your information in a recent review of a game of mine, at a certain point of the game Leela’s most explored move (bearing letter C) was a ladder that did not work, and the full subsequent sequence clearly showed that the stones of the running ladder were dead. So I guess that Leela would have picked the move A:

Edit: although in this position Black 1 is Leela’s A move, which I don’t really understand as white can immediately capture :

In this position, Black 1 (Leela’s A move) does not make sense to me either. Later in the review, Leela’s A move was E15 or F16 repeatedly, which I see as wasting ko threats:

On the other hand, seems to me that Leela’s A is right in this position. If one were to play the brightest suggestion, black would play F6 right?

All this to say that I really don’t get it.

SanDiego · May 18, 2019, 8:44pm

AI suggestions only make sense for balanced games. When the win probability for one player is far above 90%, then the AI starts hoping for a miracle.

Maybe the OGS implementation should come with a warning on this, for people who are not familiar with LZ?

hqrpie · May 18, 2019, 9:03pm

Oh that makes sense, thanks!

Vsotvep · May 19, 2019, 1:02am

I guess this technically also holds true for humans

Lys · May 19, 2019, 9:30am

I used Leela 11 for some time to review my games. Very often its best moves (in my games) was just wasting ko threats. This was also confirmed in a thread here in the forum.

Don’t know if that applies also to Leela Zero.

flovo · May 19, 2019, 12:35pm

For AI ko fights are hard to read.

As a human you simply count the ko threats for each colour.
The AI has to play out all variants for this fight. So it seems reasonable to assume that AIs tend to avoid ko fights.

To read out a ko fight involving 3 ko threats for black and 3 for white, the AI has to read through at least 36 variations with 18 playouts each. The resulting board positions look the same for a human, but for the exploring AI they are 36 different branches (board positions). All 36 branches have the same chance to win (they are the same board position) so the AI has to spend its remaining playouts equaly on all 36 branches.

For LZ with 1600 playous this results in 1600/36-18≈26 playouts per branch.
To read out a ko fight involfing 4 thrats per colour creates 576 branches and needs 1728 playouts (more than LZ has to spent).

A ko threat is a move plus a response which doesn’t really change the win rate for both player (ignoring possible future ko fights), so to play a ko threat immediately could look like the best move on the board. A move can only reduce ones win chance after all and the ko threat doesn’t seem to reduce it.
Besides if the AI explores the ko threat it halves it’s playouts by exploring 2 almost similar board positions (similar by chance to win).

Thinking about it, if its estimated chance to win goes down with each move it reads deeper, to play the ko thread looks better since the effective reading deeph is 2 moves lower.

On the other hand, LZs neuronal net could learn to play kos only if a ko threat is needed (if a ko is active), but it doesn’t seem to care.

Lys · May 19, 2019, 12:51pm

Well explained, thank you

Vsotvep · May 19, 2019, 1:14pm

Doesn’t LZ have a way to merge equal game states when they arise in two branches? If it doesn’t it should surely be implemented, as that will be a big improvement in general, not just for dealing with ko threats.

flovo · May 19, 2019, 1:36pm

I don’t think LZ merges branches. As far as I know, the Alpha Go papers don’t mention such a method, and LZ is an AGZ reimplementation.

For the NN the game state includes the last 8 moves (to enable the NN to detect small super ko cycles), so there would be a difference as well.

And one has to ensure that the merge doesn’t affect any active and future kos.

One need some good heuristics to identify branches which can safely be merged. Such a heuristic bears the risk to misread ko cycles. (AIs play with super ko rules)

SanDiego · May 19, 2019, 4:40pm

Having studied a few AI games, I strongly disagree with this. AlphaGo Zero and LZ handle ko fights at a scale that is far beyond what humans can read. I am not an expert but my understanding is that those AIs don’t have to play out all variants to reach a conclusion.
Check out my post here, where I share an ELF vs. LZ game I ran on my computer:

I don’t believe this either, and even if it was the case we humans would not be able to judge. What I believe is that when the AI throws away a ko threat it’s because it estimates that it won’t have an impact on the result.