In Lizzie, why is the "move the bot would chose" not the one with the highest percent?

Eugene · April 30, 2019, 11:56pm

In Lizzie, as I understand it, the move that Lizzie will play is the one marked in blue. It also happens to be the one with the most playouts.

What I don’t understand is why this isn’t always also the one with the highest % to win?

One explanation for this that I’ve read is that “the % to win is based on the current-position-analysis-network”.

This would be a completely satisfying explanation except for one thing. These % to win numbers changes as the analysis proceeds.

So if they are actually based on playout results, why would the bot not chose the highest one at any time?

mark5000 · May 1, 2019, 12:21am

I’m pretty certain this is a LeelaZero question, because the blue move in Lizzie is just the one LeelaZero would play.

I think the answer has something to do with confidence bounds. (Search “confidence” here: https://github.com/leela-zero/leela-zero/blob/3f297889563bcbec671982c655996ccff63fa253/src/UCTNode.cpp) LeelaZero would behave erratically if it was switching moves to the highest winrate all the time. Instead, it favors variations it has explored before, because it’s confident the move can’t be too bad, and switching to a less-explored move can be risky. It’s kind of like the folk saying: “Better the devil you know than the devil you don’t.” I might be wrong.

Eugene · May 1, 2019, 12:39am

Awesome - I can relate to this as “these high % options appear appealing, but this is the one I have read out and am confident with”.

danielt1263 · May 2, 2019, 12:56am

Imagine you want to get to the store quickly and ask for directions. The person you are asking says, “I’ve driven rout A a thousand times and the worst time I’ve done is 10 minutes, I drove rout B once and it took 9 minutes.” Which rout is faster? What if his one drive down rout B was an anomaly and it usually takes 12 minutes?

The number of playouts is an important factor in determining the best move. And frankly, if both moves give a better than 50% chance to win, then the one with the most playouts, and therefore the most surety of its percentage value, is the better choice.

ckersch · May 2, 2019, 1:22am

In theory, if you let LZ run for long enough, it’ll converge on the “best” option, since search is weighted, in part, by the strength of the evaluation.

In practice, I think this is a weakness in LZ. Estimating error bounds based on current evaluation and number of playouts and picking the move with the highest 10% confidence value, for example, seems like it’d give better moves. That being said, if the difference in evaluation is usually small, then it probably doesn’t matter all that much. LZ is strong enough that even the 2nd best move is likely a great move, certainly more than good enough to beat any human player (unless there’s a forced move, or something).

Eugene · May 2, 2019, 2:15am

This is a great analogy and way of looking at it.

Number of playouts == confidence of the win-percentage.

-> A high win percentage with low playouts == maybe this is a better move.

Lys · May 2, 2019, 12:48pm

Thank you @GreenAsJade for this thread.

All said here looks reasonable, but in the only example I have the most explored move isn’t even in the top 6 moves. So this is confusing me.

Here a screenshot:

My only experience is with Leela 0.11.0
Leela uses terms like “Effort%”, “Simulations”, “Nodes” that I believe are similar to the concept of “explored move” here.
At the beginning of analizing a board position, the best move (the first one in the “Analysis” window) isn’t necessarily the one with maximum effort or simulations. Later on, various moves compete between each other and eventually the effort seems to concentrate on the “last” best move.
So we could say that, granted a sufficient number of simulations, the most explored move is also the best one.

But here we have a most explored which isn’t even in the top 6, so what happened?
Too few simulations?
Something not working?
Leela Zero is different from Leela?
What else?

Eugene · May 2, 2019, 3:30pm

I think people have mentioned that the preview only has 200 playouts, which is small.

That could well be the explanation, though some other sort of bug is also possible of course

Lys · May 2, 2019, 4:57pm

Do playouts (Leela Zero) is the same of simulations (Leela)? Or nodes? (I suppose in Leela simulations and nodes are synonyms).

I used to let Leela check thousands of nodes for each move. At least 30k.
Maybe Leela Zero works differently.

Eugene · May 2, 2019, 11:30pm

Playouts and simulations are synonyms as I understand it. Nodes might mean “nodes in the MCTS tree” in which case that would also be a synonym in this context.

Leela and Leela Zero do work differently - Leela Zero is the one that is like AlphaZero.

Leela was built before AlphaGo came out, though as I understand it does include deep learning now. I’m hazy on that

Lys · May 3, 2019, 9:39pm

I know that LZ has no training based on human games.

My question was about the number of playouts. I’ve seen Mark suggesting something like 800 or 1000 playouts on LZ, while with Leela 11 I used to let it do many more: sometimes the best move is still changing after tens of thousands simulations.

Deep_Scholar · May 4, 2019, 1:16am

That is one of the hard part of programs is when to rely on a search of a position, as relying too much what a program says can hurt your learning, and chess programs are the same way as some moves are realistic for a human to play but second best rather than maybe a move that may not make sense to you and then it decides to tenuki the next 5 moves but it has a better score. Usually when I use a program it is about the future it projects and the comfort of the position rather than a line down the road relying on a ko to survive as could be a blindspot to the program in the life and death.

flovo · May 15, 2019, 9:19am

I’m not a AI expert, so this is how I understand it.

Simulations in leela and playouts in LZ are approximately the same. (A playout is a unique game state)

The reason why LZ only needs some thousands of playouts (1600 for selfplay training) and leela needs playouts in the tens of thousands is they work quiet differently.

leela relays heavily on MCTS (Monte Carlo Tree Search). Its positional judgement is rather bad so it has to simulate (read) the probable variations to the end of the game (or at least very deep) to get a good estimation. Therefor leela needs quiet many playouts to get good estimations. Luckily playouts in leela are quiet fast, so you get many of them in a reasonable time.

LZs positional judgement is much better. It uses a NN (neuronal network) which can reliable evaluate where to play and who is ahead. For a given board position (actially the last 8 or so positions because ko) the NN gives you 362 (19x19 + pass) move probabilities and a win probability. For a human we would call this probably positional judgement.
LZ now does something similar to MCTS by playing out the most probable variations, evaluating the NN for every board position on the way. Let’s call this reading.
Since LZ has a good estimation of who is ahead along the way, it hasn’t to read as deep as leela to get an estimation of the variation is good.

Since the evaluation of a board position by the NN take much computation power, LZ has to spend much more time on one playout then leela, the total number of playouts is much lower.

In the end it’s about quality and quantity. And it’s seems that quality is the way to go right now.

Lys · May 15, 2019, 10:07am

I get the sense, but I think that also Leela uses NN.
The difference between Leela and LZ is the training of the NN: human games for Leela, self games for LZ.

This is quite satisfying for me:

Since LZ has a good estimation of who is ahead along the way, it hasn’t to read as deep as leela to get an estimation of the variation is good.

except for the fact that it seems to say: humans made a big mess! To be strong at evaluating board positions is better not to follow human advice!

flovo · May 15, 2019, 10:51am

leela is using a mixture of NN and MCTS. Judging from the speed the NNs have to be small (fast but not that good). It’s not just human knowledge vs machine self learning.