I have a question for those who know Leela Zero and also Leela 0.11
I asked in another thread too but had no response.
So, if you think it’s OT here, please post your answer there:
What exactly “playouts” means for LZ? Is the same as “nodes” or “simulations” on Leela?
I can see Leela change its mind about the “best move” while doing more and more simulations, so I let it run for tens of thousands (30k, 50k, even more) on a single move before considering that done.
On LZ I can see people talking about 200 / 800 / 1000 playouts, so I’m not sure I understand correctly.
I believe the only difference with old-fashioned Monte Carlo Tree Search is the method by which it selects the next node to explore. In “classic” MCTS this is done by a relatively easy function that looks at how much some node has been explored and how well it’s chances are, but for LZ this is done by a function that uses a neural network, that is trained to spot the best move. It also uses the neural network to do the simulation, where it keeps playing until the game is decided and then updates the parent nodes. In classic MCTS this is often done randomly, or using heuristics that aren’t based on deep learning.
So then a playout is just the same as a simulation, but the thing with LZ is that it uses a much more complicated neural net, and thus one playout takes a lot more time (but is way more accurate).
Yeah, playout is very close to simulation. In playout procedure LZ tries a sequence of moves from root position and evaluates the result of the sequence.
With new fancy neural networks as they get smarter and larger, they don’t need as many playouts to evaluate the position correctly, they just “know”.
Just play around with LZ. If you see win% changes significantly with new playouts then it clearly haven’t decided yet. If it stays more or less the same then what’s the point? Yeah, it might change its mind after a billion of playouts but is the waiting really worth it?
Does anybody know the command line options used for the 400 game validations? In particular, if these run on one thread without randomization, why do they not play the exact same game 400 times over? There must be a random element, but I do not know what it is.
Leela Zero does not do any monte carlo rollouts. Instead of rolling out a game to the end N times and then looking at the resulting winning percentage, it just asks the validation head of the network what it expects the winrate to be from the current position. It expands the tree one move at a time, always expanding from positions where winrate is high and previous exploration is low. Every time a position is expanded by one move, there is one call to the network. This is called UCT search (upper confidence bound for trees). Alphago zero works the same way. I do not think Monte Carlo Tree Search is a good name for this algorithm, given that it does no random rollouts at all.