Leela Zero progress thread

ckersch · May 29, 2018, 7:58pm

In terms of LZ’s progress stalling, a new network just promoted, after around 88k games. For reference, that’s not even the most games required to promote a new 192x15 network. It’s far from the number of games seen on some of the smaller networks. It’s something we would reasonably expect to see on a regular basis if the ELF games weren’t being used, which substantially increased the speed of improvement. I’m not sure why the new net took longer than the few before it;pPerhaps they didn’t use ELF games, perhaps it was a statistical anomaly, or perhaps the network is saturating. It doesn’t seem like cause for alarm, and definitely doesn’t mean the LZ project is at “rock bottom.”

mark5000 · May 29, 2018, 8:34pm

A graph of training games per network over time, supporting this point, was just posted on r/cbaduk:

wl6uhv049u011

I don’t think one can read the graph and have many illusions left about progress being stalled. LZ seems to be doing better than ever since the introduction of ELF OpenGo games and/or the new temperature parameter. I am very much looking forward to seeing many stronger 15x192 networks.

TheBeginer · June 5, 2018, 2:00am

Hard stalled and drifting worse than Need For Speed

After FIVE more promotions there was a re-test against ELF and it got 86.17%, but it was already at 89% five networks ago… so 5 new promotions of 5% each, and yeilds only a 3 % imrovepment is NOT GOOD!

Based on third party testing, it has regressed back to network 139 or so thereabouts…

Though in other ismiliar news, Chess is not doing that hot either:http://lczero.org/
and Minigo has stalled too, again : http://cloudygo.com/v7-19x19/eval-graphs

DVbS78rkR7NVe · June 5, 2018, 2:14am

Chess is doing fine. Self-play elo doesn’t make much sense for chess. It was observed multiple times that when self-play elo stalls, elo calculated from independent matches against other engines actually rises, so it progresses. And that’s why they always promote their nets. There’re large google sheets with real elo estimates but they’re confusing so I’m not going to link any of that stuff, but lczero progresses.

mark5000 · June 5, 2018, 4:25am

This is good news. We scored 3.5% more wins against the strongest available AI while using a smaller network trained on less than half as many games.

Why the perpetual “doom and gloom”?

ovijol · June 5, 2018, 7:31am

I am following Leela Zero Chess. She is doing quite well. Playing at about 3100 elo at slow time control.
Progress could be (we wish) faster but the project started just 3 months ago and there are about 15 million games played only (compared to 44 in Alpha Zero). She is producing also some beautiful games.
These are projects to be patient and enjoy along the way…

TheBeginer · June 7, 2018, 2:53am

I try to be objective to the best of my ability, and I will present the facts/math and supplement it with my opinion supported by evidence proferred below that the situation is moribund.

Chaining strength differences under an Elo-like model works by multiplying odds, so the correct formula to use here would be 1/(1+(1/0.55-1)^N)

We don’t always get an exact 55% of course, so In this case, since the last “ELF” benchmark on or about May 20th, of which had scored a 89.62% (ELF against LZ) we have had six additional new network promotions.

The formula is therefore:
1/(1+(1/0.5298-1)(1/0.5659-1)(1/0.5484-1)(1/0.5613-1)(1/0.5375-1)*(1/0.5661-1))

Which is about 0.775802494 wr% improvement

So using an ELO calc (http://www.3dkingdoms.com/chess/elo.htm) to convert winrate to other metrics, we can calculate the following:

We get obtained a +216 boost, from network 142 to 147.
But by using the site’s own stats ( http://zero.sjeng.org/ ) we get a self train ELO difference of approx 11401 - 11206 = +195
By rescaling it into actual Go ratings ELO we get an improvement of at least 64 ELO (expected)

However, the actual result of the re-benchmark of LZ vs ELF on June 3rd yielded winrate of 86.17%

86.17% -> 318
89.62% -> 374

Only a 56 ELO ACTUAL ELO improvement over a period of 13 days, spanning an average of only about 4 ELO improvenment per day.

At present course, LZ is about 11401/3-430 = 3370 true ELO, compared to AlphaGoZero of over 5500+ ELO, it would take another 532 days to catch up at this rate…

Many believe the strict 55% gating is now hurting the project and simulations support this evidence:

Countless times we wait days/weeks later to end up promoting a network that is actually weaker than one that could have been promoted days/weeks ago…
I previously spoke to this at some length: https://github.com/gcp/leela-zero/issues/1229

The other problem is Facebook already maxed out ELF at the 20 block, so LZ still using 15 block has no chance to catch up and its hopeless unless and until we move to a larger block size. But there are no contender 20 blocks on the horizon of and so to speak of and etc

Last, of concern is that the method used for training is entirely suboptimal by a large stretch:

There is a far superior method that solves ladders, ko, seki, accurate counting, perfect endgame play, and even high handicap, but it would require very time consuming retraining from scratch

mark5000 · June 7, 2018, 7:21pm

@TheBeginer These ideas are beyond the scope of the Leela Zero project.

From the README.

stone.defender · June 9, 2018, 3:20pm

How to install LZ on android mobile phone and use Elf network

install
https://play.google.com/store/apps/details?id=nl.tengen.gridmaster&hl=en
download ELF network http://zero.sjeng.org/networks/d13c40993740cb77d85c838b82c08cc9c3f0fbc7d8c3761366e5d59e8f371cbd.gz
unpack it to Download folder that not on sd card, rename “elf” for example
in the app, open menu by pressing 3rd functional button -> Settings
board 19x19, komi 7.5, unheck cap speed, check slow timeouts
press “+” , download from internet, there will be list of engines to download, choose any “leelaz”
(there are other versions if you press “parent directory”, I didn’t try them)
press “edit”, change Args line to
-g -w/storage/emulated/0/Download/elf -t1 -v1
I don’t know how many threads(-t1) your phone supports, you can also change number of visits(-v1) with 30 it can beat normal Leela on strong PC, it takes 30 seconds / move on my phone
new game, white player -> program-> Leela Zero, Rules Chinese, OK

DVbS78rkR7NVe · June 10, 2018, 9:39pm

@TheBeginer Leela Chess Zero is having a great time. I know it’s a result of an update, but looks very funny.

ovijol · June 11, 2018, 10:14am

There were some bugs in the code, etc. So the developers considered best to rollback and discard “contaminated” games from training. Nevertheless, “real Elo” drop is not that huge. At the same time, they are training a second network from scratch that is learning quite fast.
When reading Alpha Zero papers, everything seems so easy… they solved chess, shogi, just in several hours!!
Of course they do not mention how many tests were performed before that final run.
A future development is to force the use of lc0, that is optimized to run on GPU, and will speed calculations 10x.
Funny times for experiments.

ovijol · June 11, 2018, 10:16am

This is the graph showing self elo from the beginning. It puts the former one in context.

Pond_Turtle · June 11, 2018, 9:24pm

I am sure having those fancy, custom made TPUs did not hurt either…

Pond_Turtle · June 12, 2018, 6:22am

Incidently. New generation of GPUs should be upon us soon. Let¨s keep fingers crossed and hope for huge crypto currency crash so we can get them at reasonable prize and increase training speed

mark5000 · June 14, 2018, 5:43pm

Github user bjiyxo trained another 20x256 network (V15) training up to LZ network #146 (including Elf OpenGo selfplay). Github user ryouiki tested the network at 400 visits for 200 games against LZ networks #133, #144, and #148. Here are the results:

These results could mean the 15x192 network is reaching saturation. It also suggests that the current net (#148) is a real monster, which could be why there has been no new network in the past few days.

github.com/leela-zero/leela-zero

I trained a 20b 256f network (93229e)

opened 08:58AM - 27 Mar 18 UTC

closed 05:16PM - 05 Aug 18 UTC

bjiyxo

Currently I train a network(20 blocks and 256 filters). It initialized from b8ad…b7da(78) n2n to 20b 256f. And it is now being trained to b3a80524(99). According to unofficial test by others, it is much stronger than current weights (85c6f2ad) on 1080Ti. Enjoy it! V2 (93229e) training up to b3a80524(99) https://drive.google.com/file/d/1m4rK068Kiky1sKbMrwCcR49GBI0krhSu/view?usp=sharing Edit: 3/29 V3 (8d9c5e) training up to 18827fa7(101) https://drive.google.com/file/d/1Z1GP0IjCNJUaVEFJQ__97b2I-wf8IAjX/view?usp=sharing Edit: 4/5 V4 (5b48caa1) training up to 193437be(104) https://drive.google.com/file/d/1y1A-ijeE1f50EuT0haeWUEtNdtUhJuON/view?usp=sharing V4 swa (00335835) https://drive.google.com/file/d/11GizcSzrr3ZTx8GPOOdlSn8b5YdWdN4R/view?usp=sharing The following versions all use swa. Edit: 4/7 V5 (1629cb) training up to b7768081(107) https://drive.google.com/file/d/1TikFfZkrtzfs0jBu1Y7-4IE-T6TMXRRY/view?usp=sharing Edit: 4/13 V6 () training up to 1ccb7342 (108) https://drive.google.com/file/d/1sWfBzeXJsuK6xCGnCrpfNTx6R1mHs66F/view?usp=sharing Edit: 4/17 V7 (00045d1d) training up to 2f4d7274 (112) https://drive.google.com/file/d/1fx4DeqLDUuC2BlyWqY4ubgFtUDL6GNi9/view?usp=sharing V8 (b269a6c7) training up to 8ed44722 (115) https://drive.google.com/file/d/1HpEQwzhDlgp0v636aSkY7ha_hDbtX7u0/view?usp=sharing V9 (32179366) training up to 39d46507 (116) (10b finished) https://drive.google.com/file/d/1KayAwWUqkaz8eM9M6YPO84Nt7vHMCOw9/view?usp=sharing Edit: 4/28 V11 (90767fc9) training up to 8a045bce(124) https://drive.google.com/file/d/1Ljt0anATrtfdtpH2xPkLs-4NP7pIgy8i/view?usp=sharing V12 (bef701e8) training up to 59bb7337 (126) https://drive.google.com/file/d/1PdbC3hLH89q9Ysb5gaNgfI0Xvp2-dTfn/view?usp=sharing Edit: 5/11 V13 (756542bf) training up to ecab83bb (131) (before ELF's selfplay) https://drive.google.com/file/d/1o1chP-ohKI1o_1kjvqCsbpo-UxgT6Vmy/view?usp=sharing Edit: 5/28 V14 (9ee6ab54) training up to 7c6588ce (142) (including ELF's selfplay) https://drive.google.com/file/d/1WLVOqoEeWIg8gSyNrgEMJs7PuLBuNmu7/view?usp=sharing Edit: 6/12 V15 (fee6830f) training up to 0cb74be2 (146) (including ELF's selfplay) https://drive.google.com/file/d/1lSeBRVVhHWr2Za3VPvfuf4oPi1M622EM/view?usp=sharing Edit: 6/25 V16 (c010f034) training up to d0187996 (148) (including ELF's selfplay) https://drive.google.com/file/d/1Zz-2Ktku0R86Le0o1VuGrNLCEF_4NcFj/view?usp=sharing Edit: 7/2 V17 (0589016c) training up to 2b80a9db (150) (including ELF's selfplay) https://drive.google.com/file/d/19ogANudaRHdi9iF9B1GzFHMzV8NaSXr_/view?usp=sharing Edit: 7/7 V18 (fb01adbb) training up to e1d466aa (153 incomplete, only ~40k games) (including ELF's selfplay) https://drive.google.com/file/d/1quBKBS68b3v8J-EnQxW2MAZSSve-RJc3/view?usp=sharing Edit: 7/18 V19 (48437672) training up to e1d466aa (153) (excluding ELF's selfplay) https://drive.google.com/file/d/1PGTm2LLSW9vEtUoBXF1OKeHfWoj40PvU/view?usp=sharing Edit: 7/22 V20 (0c77d215) training up to 050375ce (156) https://drive.google.com/file/d/15pwwnaDhgprRhN0F18xa-ldp5Vk9vo9N/view?usp=sharing Edit: 7/28 V20-2 (dc011d01) training up to 050375ce (156) (including ELF's selfplay) https://drive.google.com/file/d/16OmFwwvdJuHgwuVVzuTUYQ7gaTCruHzj/view?usp=sharing

TheBeginer · June 15, 2018, 2:33am

There is the problem. Total ELF self playing games have now exceeded 200,000+… which means very shortly here, the ELF magic sauce is going to stop pumping. This is because in the past the main developer @gcp stated in no uncertain terms that he will not allow more than half the training window of aggregate ELF games, to wit: it shall not exceed a lifetime total of 250,000 games since the entire training window is half a mil. As we can see, even with the ELF injections the process curve as flattenned quite a bit, so soon ELF magic is going bye bye and unless someone is of the contention that ELF didn’t really make much a difference at all, its certain that the training will become even weaker…

Today @gcp also stated the newest V15 20blocks are “no stronger” than the current 15 blocks. Whilst indeed a ~57% win rate would have gotten a PASS in terms of gating, but when comparing disparate network arch sizes its important to adjust for time parity in order to be fair and do a true benchmark, thus when adjusted for time parity, the very latest/newest 20blocks are weaker than the current 15 blocks… Thus there does not seem to be a clear path forward, previously everytime we had stalled for no new nets for almost a week, what saved the day in each of those past instances was the move to a higher block count and thus continuing the cycle of progression… but there isn’t a contender for 20 block that can take the baton and ELF juices are just about going to run out (not to mention they have already started running out of steam)…

Its not clear whether the results from the 20 block V15 vs 15 blocks net 133, net 148 etc was because net 148 truly was that much stronger than net 144 (then if so, why wasn’t that strength reflected on the self play elo chart? shouldn’t it have been tremendously stronger than a mere 57% wr???) or if it was the result of the experimental 20 blocks no longer able to keep up or to maintain the lead against the 15 blocks, in which case the test results would be less indicative of net 148’s strength gain over net 144 and more indicative of the fact that very simply the 20 block is simply underperforming the 15 blocks for reasons I have enumerated in the past (because the guy actually net2net’d the 20 block from a very old/weak 6 block, which I believe may have caused permanent long term growth caps etc)… again if we are to believe that there was more or less a sharp 20% improvement from net 144 to net 148, then how is it that dontbtme’s status report on reddit indicates that net 140 could beat Leela 11 unlimited on just 55 blocks and yet net 148 still cannot? This wouldn’t make any sense and there are plenty of counterindications that the progress has dramatically slowed.

Facebook maxed out ELF at its 20 block size, so logically it would make sense that at 15 blocks there is no way LZ can ever hope to catch up to ELF, because if it could, that would mean Facebook didn’t really reach 20 blocks full potential and I for one find that very hard to believe.

I’ll go ahead and make the official written prediction that net 148 is the “last of the mohicans”

TheBeginer · June 15, 2018, 2:51am

Chess has passed the point of no return, doesn’t seem like the project can come back from this loss. Went from over 600+ to now just barely 200 clients… its coming apart

DVbS78rkR7NVe · June 15, 2018, 7:07am

Is that your post on reddit? I liked it.

Leela is quite rapidly falling apart. They used to have over twice as many people contributing. Then they update and experiment and accidentally introduced bugs that broke the rules to the game and went untouched for 1/4 of the total generated games. So after they hit a wall after a month of tell the people “it’s still improving, trust us” they delete those 4 million and try to continue because IT’S NOT. However, half the devs seem to want to delete all the games and start from scratch because they think the rest was corrupted by those 4 million games. So if we hit a wall again anytime soon, which we will, then leela is going to get reset. And then the users will drop to next to nothing and leela will be dead.

The fact that they use github instead of an actual website have no blog or news updating the process or anything of the sort is crushing what could have been huge. Leela has been mismanaged from the beginning and is going to fail quite soon.

Whereas you look at something like Lichess where a single person with an open source product who does something as a hobby can do something amazing when done properly.

Automate LC0 before it launches and DO NOT TOUCH IT.

And seriously, quit the A0 wannabe club, train it with a 7 man tablebase. Human’s might be flawed but the 7 man tablebase isn’t and will reap huge rewards for LC0 to know where to go.

Only a matter of time before it all falls apart.

Link on reddit: link

ovijol · June 15, 2018, 9:49am

I am not going into details or controversy: Leela chess is going quite well.

I do not understand this kind of catastrophic messages unless they deliberately try to make harm to that project.

(if such is the case it would be better to publish them in Leela chess forum, just a suggestion).

TheBeginer · June 15, 2018, 10:23pm

The situation is moribund if no decisive correction action is soon taken…

Math does not lie…