In terms of LZ’s progress stalling, a new network just promoted, after around 88k games. For reference, that’s not even the most games required to promote a new 192x15 network. It’s far from the number of games seen on some of the smaller networks. It’s something we would reasonably expect to see on a regular basis if the ELF games weren’t being used, which substantially increased the speed of improvement. I’m not sure why the new net took longer than the few before it;pPerhaps they didn’t use ELF games, perhaps it was a statistical anomaly, or perhaps the network is saturating. It doesn’t seem like cause for alarm, and definitely doesn’t mean the LZ project is at “rock bottom.”
A graph of training games per network over time, supporting this point, was just posted on r/cbaduk:
I don’t think one can read the graph and have many illusions left about progress being stalled. LZ seems to be doing better than ever since the introduction of ELF OpenGo games and/or the new temperature parameter. I am very much looking forward to seeing many stronger 15x192 networks.
Hard stalled and drifting worse than Need For Speed
After FIVE more promotions there was a re-test against ELF and it got 86.17%, but it was already at 89% five networks ago… so 5 new promotions of 5% each, and yeilds only a 3 % imrovepment is NOT GOOD!
Based on third party testing, it has regressed back to network 139 or so thereabouts…
Chess is doing fine. Self-play elo doesn’t make much sense for chess. It was observed multiple times that when self-play elo stalls, elo calculated from independent matches against other engines actually rises, so it progresses. And that’s why they always promote their nets. There’re large google sheets with real elo estimates but they’re confusing so I’m not going to link any of that stuff, but lczero progresses.
This is good news. We scored 3.5% more wins against the strongest available AI while using a smaller network trained on less than half as many games.
Why the perpetual “doom and gloom”?
I am following Leela Zero Chess. She is doing quite well. Playing at about 3100 elo at slow time control.
Progress could be (we wish) faster but the project started just 3 months ago and there are about 15 million games played only (compared to 44 in Alpha Zero). She is producing also some beautiful games.
These are projects to be patient and enjoy along the way…
I try to be objective to the best of my ability, and I will present the facts/math and supplement it with my opinion supported by evidence proferred below that the situation is moribund.
Chaining strength differences under an Elo-like model works by multiplying odds, so the correct formula to use here would be 1/(1+(1/0.55-1)^N)
We don’t always get an exact 55% of course, so In this case, since the last “ELF” benchmark on or about May 20th, of which had scored a 89.62% (ELF against LZ) we have had six additional new network promotions.
The formula is therefore:
Which is about 0.775802494 wr% improvement
So using an ELO calc (http://www.3dkingdoms.com/chess/elo.htm) to convert winrate to other metrics, we can calculate the following:
We get obtained a +216 boost, from network 142 to 147.
But by using the site’s own stats ( http://zero.sjeng.org/ ) we get a self train ELO difference of approx 11401 - 11206 = +195
By rescaling it into actual Go ratings ELO we get an improvement of at least 64 ELO (expected)
However, the actual result of the re-benchmark of LZ vs ELF on June 3rd yielded winrate of 86.17%
86.17% -> 318
89.62% -> 374
Only a 56 ELO ACTUAL ELO improvement over a period of 13 days, spanning an average of only about 4 ELO improvenment per day.
At present course, LZ is about 11401/3-430 = 3370 true ELO, compared to AlphaGoZero of over 5500+ ELO, it would take another 532 days to catch up at this rate…
Many believe the strict 55% gating is now hurting the project and simulations support this evidence:
Countless times we wait days/weeks later to end up promoting a network that is actually weaker than one that could have been promoted days/weeks ago…
I previously spoke to this at some length: https://github.com/gcp/leela-zero/issues/1229
The other problem is Facebook already maxed out ELF at the 20 block, so LZ still using 15 block has no chance to catch up and its hopeless unless and until we move to a larger block size. But there are no contender 20 blocks on the horizon of and so to speak of and etc
Last, of concern is that the method used for training is entirely suboptimal by a large stretch:
There is a far superior method that solves ladders, ko, seki, accurate counting, perfect endgame play, and even high handicap, but it would require very time consuming retraining from scratch
How to install LZ on android mobile phone and use Elf network
- download ELF network http://zero.sjeng.org/networks/62b5417b64c46976795d10a6741801f15f857e5029681a42d02c9852097df4b9.gz
unpack it to Download folder that not on sd card, rename “elf” for example
- in the app, open menu by pressing 3rd functional button -> Settings
- board 19x19, komi 7.5, unheck cap speed, check slow timeouts
- press “+” , download from internet, there will be list of engines to download, choose any “leelaz”
(there are other versions if you press “parent directory”, I didn’t try them)
- press “edit”, change Args line to
-g -w/storage/emulated/0/Download/elf -t1 -v1
I don’t know how many threads(-t1) your phone supports, you can also change number of visits(-v1) with 30 it can beat normal Leela on strong PC, it takes 30 seconds / move on my phone
- new game, white player -> program-> Leela Zero, Rules Chinese, OK
@TheBeginer Leela Chess Zero is having a great time. I know it’s a result of an update, but looks very funny.
There were some bugs in the code, etc. So the developers considered best to rollback and discard “contaminated” games from training. Nevertheless, “real Elo” drop is not that huge. At the same time, they are training a second network from scratch that is learning quite fast.
When reading Alpha Zero papers, everything seems so easy… they solved chess, shogi, just in several hours!!
Of course they do not mention how many tests were performed before that final run.
A future development is to force the use of lc0, that is optimized to run on GPU, and will speed calculations 10x.
Funny times for experiments.
I am sure having those fancy, custom made TPUs did not hurt either…
Incidently. New generation of GPUs should be upon us soon. Let¨s keep fingers crossed and hope for huge crypto currency crash so we can get them at reasonable prize and increase training speed
Github user bjiyxo trained another 20x256 network (V15) training up to LZ network #146 (including Elf OpenGo selfplay). Github user ryouiki tested the network at 400 visits for 200 games against LZ networks #133, #144, and #148. Here are the results:
These results could mean the 15x192 network is reaching saturation. It also suggests that the current net (#148) is a real monster, which could be why there has been no new network in the past few days.
There is the problem. Total ELF self playing games have now exceeded 200,000+… which means very shortly here, the ELF magic sauce is going to stop pumping. This is because in the past the main developer @gcp stated in no uncertain terms that he will not allow more than half the training window of aggregate ELF games, to wit: it shall not exceed a lifetime total of 250,000 games since the entire training window is half a mil. As we can see, even with the ELF injections the process curve as flattenned quite a bit, so soon ELF magic is going bye bye and unless someone is of the contention that ELF didn’t really make much a difference at all, its certain that the training will become even weaker…
Today @gcp also stated the newest V15 20blocks are “no stronger” than the current 15 blocks. Whilst indeed a ~57% win rate would have gotten a PASS in terms of gating, but when comparing disparate network arch sizes its important to adjust for time parity in order to be fair and do a true benchmark, thus when adjusted for time parity, the very latest/newest 20blocks are weaker than the current 15 blocks… Thus there does not seem to be a clear path forward, previously everytime we had stalled for no new nets for almost a week, what saved the day in each of those past instances was the move to a higher block count and thus continuing the cycle of progression… but there isn’t a contender for 20 block that can take the baton and ELF juices are just about going to run out (not to mention they have already started running out of steam)…
Its not clear whether the results from the 20 block V15 vs 15 blocks net 133, net 148 etc was because net 148 truly was that much stronger than net 144 (then if so, why wasn’t that strength reflected on the self play elo chart? shouldn’t it have been tremendously stronger than a mere 57% wr???) or if it was the result of the experimental 20 blocks no longer able to keep up or to maintain the lead against the 15 blocks, in which case the test results would be less indicative of net 148’s strength gain over net 144 and more indicative of the fact that very simply the 20 block is simply underperforming the 15 blocks for reasons I have enumerated in the past (because the guy actually net2net’d the 20 block from a very old/weak 6 block, which I believe may have caused permanent long term growth caps etc)… again if we are to believe that there was more or less a sharp 20% improvement from net 144 to net 148, then how is it that dontbtme’s status report on reddit indicates that net 140 could beat Leela 11 unlimited on just 55 blocks and yet net 148 still cannot? This wouldn’t make any sense and there are plenty of counterindications that the progress has dramatically slowed.
Facebook maxed out ELF at its 20 block size, so logically it would make sense that at 15 blocks there is no way LZ can ever hope to catch up to ELF, because if it could, that would mean Facebook didn’t really reach 20 blocks full potential and I for one find that very hard to believe.
I’ll go ahead and make the official written prediction that net 148 is the “last of the mohicans”
Chess has passed the point of no return, doesn’t seem like the project can come back from this loss. Went from over 600+ to now just barely 200 clients… its coming apart
Is that your post on reddit? I liked it.
Leela is quite rapidly falling apart. They used to have over twice as many people contributing. Then they update and experiment and accidentally introduced bugs that broke the rules to the game and went untouched for 1/4 of the total generated games. So after they hit a wall after a month of tell the people “it’s still improving, trust us” they delete those 4 million and try to continue because IT’S NOT. However, half the devs seem to want to delete all the games and start from scratch because they think the rest was corrupted by those 4 million games. So if we hit a wall again anytime soon, which we will, then leela is going to get reset. And then the users will drop to next to nothing and leela will be dead.
The fact that they use github instead of an actual website have no blog or news updating the process or anything of the sort is crushing what could have been huge. Leela has been mismanaged from the beginning and is going to fail quite soon.
Whereas you look at something like Lichess where a single person with an open source product who does something as a hobby can do something amazing when done properly.
Automate LC0 before it launches and DO NOT TOUCH IT.
And seriously, quit the A0 wannabe club, train it with a 7 man tablebase. Human’s might be flawed but the 7 man tablebase isn’t and will reap huge rewards for LC0 to know where to go.
Only a matter of time before it all falls apart.
Link on reddit: link
I am not going into details or controversy: Leela chess is going quite well.
I do not understand this kind of catastrophic messages unless they deliberately try to make harm to that project.
(if such is the case it would be better to publish them in Leela chess forum, just a suggestion).