I’m pretty certain this is a LeelaZero question, because the blue move in Lizzie is just the one LeelaZero would play.
I think the answer has something to do with confidence bounds. (Search “confidence” here: https://github.com/leela-zero/leela-zero/blob/3f297889563bcbec671982c655996ccff63fa253/src/UCTNode.cpp) LeelaZero would behave erratically if it was switching moves to the highest winrate all the time. Instead, it favors variations it has explored before, because it’s confident the move can’t be too bad, and switching to a less-explored move can be risky. It’s kind of like the folk saying: “Better the devil you know than the devil you don’t.” I might be wrong.
Imagine you want to get to the store quickly and ask for directions. The person you are asking says, “I’ve driven rout A a thousand times and the worst time I’ve done is 10 minutes, I drove rout B once and it took 9 minutes.” Which rout is faster? What if his one drive down rout B was an anomaly and it usually takes 12 minutes?
The number of playouts is an important factor in determining the best move. And frankly, if both moves give a better than 50% chance to win, then the one with the most playouts, and therefore the most surety of its percentage value, is the better choice.
In theory, if you let LZ run for long enough, it’ll converge on the “best” option, since search is weighted, in part, by the strength of the evaluation.
In practice, I think this is a weakness in LZ. Estimating error bounds based on current evaluation and number of playouts and picking the move with the highest 10% confidence value, for example, seems like it’d give better moves. That being said, if the difference in evaluation is usually small, then it probably doesn’t matter all that much. LZ is strong enough that even the 2nd best move is likely a great move, certainly more than good enough to beat any human player (unless there’s a forced move, or something).
All said here looks reasonable, but in the only example I have the most explored move isn’t even in the top 6 moves. So this is confusing me.
Here a screenshot:
My only experience is with Leela 0.11.0
Leela uses terms like “Effort%”, “Simulations”, “Nodes” that I believe are similar to the concept of “explored move” here.
At the beginning of analizing a board position, the best move (the first one in the “Analysis” window) isn’t necessarily the one with maximum effort or simulations. Later on, various moves compete between each other and eventually the effort seems to concentrate on the “last” best move.
So we could say that, granted a sufficient number of simulations, the most explored move is also the best one.
But here we have a most explored which isn’t even in the top 6, so what happened?
Too few simulations?
Something not working?
Leela Zero is different from Leela?
I know that LZ has no training based on human games.
My question was about the number of playouts. I’ve seen Mark suggesting something like 800 or 1000 playouts on LZ, while with Leela 11 I used to let it do many more: sometimes the best move is still changing after tens of thousands simulations.
That is one of the hard part of programs is when to rely on a search of a position, as relying too much what a program says can hurt your learning, and chess programs are the same way as some moves are realistic for a human to play but second best rather than maybe a move that may not make sense to you and then it decides to tenuki the next 5 moves but it has a better score. Usually when I use a program it is about the future it projects and the comfort of the position rather than a line down the road relying on a ko to survive as could be a blindspot to the program in the life and death.
I’m not a AI expert, so this is how I understand it.
Simulations in leela and playouts in LZ are approximately the same. (A playout is a unique game state)
The reason why LZ only needs some thousands of playouts (1600 for selfplay training) and leela needs playouts in the tens of thousands is they work quiet differently.
leela relays heavily on MCTS (Monte Carlo Tree Search). Its positional judgement is rather bad so it has to simulate (read) the probable variations to the end of the game (or at least very deep) to get a good estimation. Therefor leela needs quiet many playouts to get good estimations. Luckily playouts in leela are quiet fast, so you get many of them in a reasonable time.
LZs positional judgement is much better. It uses a NN (neuronal network) which can reliable evaluate where to play and who is ahead. For a given board position (actially the last 8 or so positions because ko) the NN gives you 362 (19x19 + pass) move probabilities and a win probability. For a human we would call this probably positional judgement.
LZ now does something similar to MCTS by playing out the most probable variations, evaluating the NN for every board position on the way. Let’s call this reading.
Since LZ has a good estimation of who is ahead along the way, it hasn’t to read as deep as leela to get an estimation of the variation is good.
Since the evaluation of a board position by the NN take much computation power, LZ has to spend much more time on one playout then leela, the total number of playouts is much lower.
In the end it’s about quality and quantity. And it’s seems that quality is the way to go right now.