Leela Zero progress thread

Thanks @mark5000 and great point about comparing the number of games. But isn’t the axis the number of days?

And weren’t the FaceBook games fed to LZ? This might also impact the progress.

I was also wondering if the ELO of the AlphaGo versions has been recalculated based on the same scale as LZ.

Strange, even after the end of LZ 9 months ago it has gained about 2400 ELO points. :smiley:

5 Likes

Golaxy vs LZ today:
https://home.yikeweiqi.com/#/live/room/17514/1/14564807

Pairings (1 vs 2, 3 vs 4 and so on):

3 Likes

image

LZ won.

https://home.yikeweiqi.com/#/live/board/17523

I watched with my home low-playout LZ.

image

Chinese experts in the chat agreed that preliminaries are round-robin, so we should see some more games today. But we’ll see.

Here it is:
https://home.yikeweiqi.com/#/live/room/17526/1/14586414

3 Likes

@S_Alexander I got you fam, have 3 more replies on me! :wink: love the updates keep them coming!!!

2 Likes

I have a question for those who know Leela Zero and also Leela 0.11

I asked in another thread too but had no response.
So, if you think it’s OT here, please post your answer there:

What exactly “playouts” means for LZ? Is the same as “nodes” or “simulations” on Leela?

I can see Leela change its mind about the “best move” while doing more and more simulations, so I let it run for tens of thousands (30k, 50k, even more) on a single move before considering that done.
On LZ I can see people talking about 200 / 800 / 1000 playouts, so I’m not sure I understand correctly.

1 Like

[disclaimer: I’m no expert]

I believe the only difference with old-fashioned Monte Carlo Tree Search is the method by which it selects the next node to explore. In “classic” MCTS this is done by a relatively easy function that looks at how much some node has been explored and how well it’s chances are, but for LZ this is done by a function that uses a neural network, that is trained to spot the best move. It also uses the neural network to do the simulation, where it keeps playing until the game is decided and then updates the parent nodes. In classic MCTS this is often done randomly, or using heuristics that aren’t based on deep learning.

So then a playout is just the same as a simulation, but the thing with LZ is that it uses a much more complicated neural net, and thus one playout takes a lot more time (but is way more accurate).

2 Likes

Yeah, playout is very close to simulation. In playout procedure LZ tries a sequence of moves from root position and evaluates the result of the sequence.

With new fancy neural networks as they get smarter and larger, they don’t need as many playouts to evaluate the position correctly, they just “know”.

Just play around with LZ. If you see win% changes significantly with new playouts then it clearly haven’t decided yet. If it stays more or less the same then what’s the point? Yeah, it might change its mind after a billion of playouts but is the waiting really worth it?

1 Like

I found this in the Leela Zero Readme but I can’t understand fully:

Using MCTS (but without Monte Carlo playouts)

Uses the tree search but without checking each branch until the end of the game?

I would read the AlphaGo Zero paper, that’s very elaborate on the process, and as far as I know what LeelaZero uses as well.

It does play each branch until the game has been decided, but instead of random play it uses a NN to find the next move, and it uses another network to judge when the game is over.

1 Like

Monte Carlo means that it involves some sort of randomisation.

3 Likes

Last time I checked it doesn’t play each branch until game is decided. And it uses one network to find both probabilities of the possible moves and value (win%). Just to clarify.

Does anybody know the command line options used for the 400 game validations? In particular, if these run on one thread without randomization, why do they not play the exact same game 400 times over? There must be a random element, but I do not know what it is.

Leela Zero does not do any monte carlo rollouts. Instead of rolling out a game to the end N times and then looking at the resulting winning percentage, it just asks the validation head of the network what it expects the winrate to be from the current position. It expands the tree one move at a time, always expanding from positions where winrate is high and previous exploration is low. Every time a position is expanded by one move, there is one call to the network. This is called UCT search (upper confidence bound for trees). Alphago zero works the same way. I do not think Monte Carlo Tree Search is a good name for this algorithm, given that it does no random rollouts at all.

4 Likes

FYI, CloudyGo just updated, providing third-party verification that LeelaZero is still improving.

3 Likes

Anyone’s familiar with this program?

No.

1 Like

LeelaZero continues improving. See https://cloudygo.com/leela-zero-eval/eval-graphs

6 Likes

How does Leela Zero compare to other top bots?

I think she’s in top 8 but not top 4

2 Likes