Leela Zero progress thread


#62

The demigods /presidents was the Facebook bot. The one that got incorporated into LZ-ELF.


#63

We have had exactly the same thing happen in chess before, so it’s nice to be able to see what happens when computers start beating everybody:

  • Commercial chess engines (“AI”) still exist, even though the free one (Stockfish) is on the same level
  • Commercial chess software is now geared towards better user interfaces and game database mangement, and still sells very well
  • Chess schools and instructors aren’t out of jobs, and if anything engines help with their work
  • More beginners think they can learn from an engine without a teacher for free - without realising that correctly interpreting what the engine says still needs playing strength

Now replace “chess” with “Go” and we’re in the future.


#64

Okay so I’m trying to think this out logically. Facebook releases ELF OpenGo, instantly becoming by far the strongest public open source /weights Go AI program, overnight gets adopted as the baseline by many other programs… so going forward most programs are going to be more or less the same strength if not simply identical whitelabel altogether, so what is left? Seems to be the “engine” part is now more or less solved and even I would say undifferentiatable. superhuman on a gtx 970, is basically the end of the road.

The remaining things of innovation are GUI, analysis, high handicap, different komi, teaching tools, etc etc

And marketing/branding/mindshare/PR for the Go bots (including Leela Zero) will perhaps be more important than ever before. We gonna see a consolidation of Go AI bots and my guess is only one or two will survive this, and that is if they are lucky.

Some immediate implications is that essentially its killed commerical Go at least from the standpoint of selling engines go. We can’t compare to Chess because not only does Chess have an order of magnitude larger userbase esp in the West, Chess also enjoyed a good two decades whereby classical algorithms and programming made it such that there was a healthy ecosystem of different engines completing with one another for top listings. WIth the advent of the “zero” method, all zero programs converge to the same ultimate state and its just a matter of compute. There is really nothing left to do. More or less.

This also means there is little to no more point in having Go AI engine competition and matches. We already see cgos is defunt and its benchmark is less and less useful, UEC cup ended, Zen pulled the plug and called it quits, I seriously doubt we’ll see another version or edition of CrazyStone, and now with so many engines adopting the facebook weights, whats the point? I see this as portending the demise of Go AI competitions and engine vs engine games as well. Think about it, LZ had beat DolBaram in that last competition match, now DolBaram adopts ELF weights, and ELF is stronger than both Pheonix and FineArt… it doesn’t take much to put two and two together and see where this is headed… Didn’t Golaxy just beat Ke Jie last week? Ill bet that was the shortest triump ever. And whatever aire of exclusitivity that FineArt enjoyed prior to the facebook event has now been obliterated, top pro in China no longer need to use FineArt to get competitive advantage in training when everyone in the world on half a decent graphics card can now run the same or better. The implications are indeed far fetching.

Lets examine the distributed community based crowd computing aspect angle. It took the public six months to get LZ to top pro level from scratch and yet facebook only needed two weeks and argueably far surpassed top pro levels and went deep into superhuman arena. Not that I know it is going to happen, but there is nothing to prevent facebook from doing it again, say another couple months down the road it can sudden drop a new weight that will be the new state of the art and far surpassing anything any community effort could have hoped to come up with within that allocation of time. Who knows maybe Google will see all this and publish the AGZ weights, or maybe in another few months the second round of weights that facebook puts out will far suprass AGZ altogether! In light of the recent developments these are all realistic possibilities now! But none of these possibilities foster morale for community initatives.

I’m thankful that prior to facebook dropping ELF onto the world, that LZ already reached and imho surpassed top pro level on its last/final network 131, (I see 132 just came out hours after the Haylee game 2 and is 60% stronger!) and that LZ project was able to convert ELF weights into native LZ format so that it can be used just like any other weightfile and now its even working great in Lizzie.

I hope that Leela Zero project finds a way to position itself to best take advantage of this new and changing landscape. By far it enjoys the most mindshare in the community of Go at large right now and I hope it continues to evolve and find ways of remaining relevant and bringing value to people’s lives.


#65

I think that the addition of engines adds some interesting questions, which are far from solved.

Namely: we’ve created Go AI that is superhuman, but that doesn’t mean that it’s the only option for superhuman Go play. Is it possible to train a Takemiya-style AI that makes giant moyos at 4000+ Elo? What about an extra-fighty Go AI? The next step won’t just be a stronger AI, but an AI that a pro can tune to play their specific style, or perhaps their anti-style for training purposes. Getting ready to fight a moyo builder? Download moyo-bot and play it at a “mere” 3600 Elo level.

Commercial entities will continue to have a part in that development, mainly because developing those engines, at least until consumer-available TPUs become a thing, will be expensive in terms of the amount of computation required. This will likely be the domain of Chinese/Japanese/Korean companies, since they’re the ones with a large enough user base for it to actually matter, unless Facebook/Google decide to push forward further on it, because they’re so huge they can do the research/development just for fun.


#66

Please forgive me if I’m pushing the chess/Go comparison too far, but the former is by far my domain of expertise, and I know very well what happens when computers get strong.

@TheBeginer is raising a lot of good points that I will address, but not completely contradict because a much of it makes sense and is indeed true in the chess community. I tried to make the last paragraph/sentence (well, I tried) in each section related to Go, and Leela also appears quite a bit in the text that follows!

Death of commercial engines?
Not quite. The top “classically programmed” engines are well-known: Stockfish, Komodo and Houdini [citation further down]. Only Stockfish is free; the other two are commercial and doing well. I would say they are surviving commercially because they come with sleek interfaces and genuinely useful functionalities (guess-the-move, database search, adjustable engine strength, etc) - branding and marketing is indeed important. Two more examples are Fritz and Chessmaster: slightly weaker engines, but still selling well because they’ve been around so long. And all of them beat humans easily.

For Go, there’s a space in the market for that. A strong AI is its own selling point, but users want a simple-to-install, easy-to-use, all-in-one program to wrap it up. As rightly pointed out, marketing matters.

Death of engine competition matches?
How do we know the top engines are the three named above? The TCEC (Top Chess Engine Competition), where submitted chess engines slug it out 24/7 over the course of several months in a gruelling standard time control league format event with a 100-game superfinal. The fact that there is competition and a “title” at stake drives the development of the top engines (and a whole host of weaker ones), and everything is broadcast live over the internet. If you want to watch superhuman chess, it’s always available on the website - great for those who want to learn from top-flight engine play. (Some ardent spectators even claim the engines have different styles, but I’m not one to judge.) It’d be nice to have something similar for Go - visible, well-known and even a little gladiatorial. Everyone likes a good public fight, after all.

"One true engine" end of the road
With the TCEC in mind, not completely true. The “state of the art” changes quite often, and progress is not a bad thing. One might argue that the top chess engines all are similar - but then AlphaZero Chess came and demonstrated another style of computing, beating Stockfish in a head-to-head. (Caveat: hardware specifications and time controls left some controversy as to whether the match is fair, but what Google demonstrated was essentially that there’s still room for a different style in chess and a different approach to engines.) Then AlphaZero vanished off the chess scene, but the point was proven.

Enter LeelaZero Chess. She travelled from East to West then caught the first rocket into the stratosphere. Before her latest conquest (beating a Grandmaster very convincingly at blitz controls), she had defeated several master players on video and on stream. Now that people know who she is, everyone’s wondering if she can take on the world’s top players, or even eventually the strongest engines.

In Go, one might also expect different training for neural-network bots, or even (gasp) the return of the classically programmed AI in the future, just as the AlphaZero approach had once been discarded decades ago as a dead end. The future holds promise.

Everyone has the top engine!
Once again, it’s not how strong the engine is, it’s how you use it. I will speak from personal experience, as one of the formerly strongest atomic chess (a chess variant) players on lichess. (Consistently among the top 50 active players, and just maybe among all players - imagine mid-dan level if the top players were ama 8-dan.) One day, the free chess engine Stockfish got an atomic chess version integrated into lichess - suddenly, everyone had easy access to a strong engine on the level of top human players.

But yet, it was evident when people followed the engine blindly. They could blitz out long variations in the most hotly-analysed openings and middlegames and rise in rating rapidly - but quickly got lost against the actual top players (who play good moves because they understand the game, not because the engine said so) and the weaker me (who plays barely-playable openings and leads people into contests of actual strength - think black hole fuseki). Granted, they probably did improve somewhat due to spending time with the engine, but I also could see the bad habits ingrained in them (and of course exploited those.)

The fact remains - the stronger you are, the more mileage you get out of the engine. If you know what areas of your game need work (for me, it was tactical blunderchecking), you can use the engine as one tool to fix those. Blindly following its moves and interpreting them doesn’t lead far - at some point you’ll run into a position that will show the shortcoming in your understanding. (This paragraph should apply to Go too.)

Community effort trumped by big names
Stockfish is free (like Linux), developed by a community that cares, and is continuously being developed and pushed beyond its limits. The commercial software are racing alongsode, but the difference between them and Stockfish isn’t significant. (Admittedly, Google didn’t seriously approach chess like it did Go, and Facebook isn’t in the equation, but you still have companies developing their engines and tweaking them to remain competitive against the free one.)

And recently, LeelaZero Chess entered the scene as mentioned above, and we all know where she came from. Give her more time to build up weights and she may just ascend above the current top chess engines. Why would she not be capable of that in Go too, against the big competitors?

The future of LeelaZero
There’s still room in the development of Leela. Can she be trained on different sets of games, to acquire a different style, as ckersch88 points out? Can she match the success of her spiritual cousin Stockfish, loved by the chess community? A lot depends on how dedicated the community is. Admittedly the big companies’ hardware resources matter quite a bit more in the Go AI scene, but the passion and the community drive will sustain LeelaZero longer than their glory-chasing - see how Google just wasn’t at all interested in chess after one paper? Slow and steady wins the race, and maybe LeelaZero can one day be the open-source community-developed AI that’s free for everyone. If the Go community can own its best tool, that’s a victory.


#67

Beginer, please don’t go wrapping up entire projects and movements and drawing conclusions when nothing has even concluded, yet. I, for one, have a significant interest in seeing top engines continue playing each other and continue improving their strengths.

It may be that, for the moment, all of the current open source engines are more or less forced to adopt ELF weights and biases, but that by no means indicates that there’s only 1 path forward from that point, or that interest in continuing to develop the engines will be blunted as a result of the release or even by the specter of a possible 2nd weight file release from FB.

More than simply being a race to the top, development of go engines as parallel lines of research allows us to gain insights and draw conclusions about both machine learning and the game of go, not to mention it gives people the opportunity to engage in a community and also experience the personal and professional growth of working on a large project in a team environment.

I don’t feel as though a single remark you made in your comment is prescient, definitive, or even accurate in any way.


#68

Things hit rock bottom recently (the upside is hopefully there isn’t anywhere to go but up from ground zero). We haven’t gotten a new network in a while and seem to be stuck on 144 for the foreseeable future. Today we got a new LZ vs Leela 11 matches published from dontbtme on reddit/r/cbaduk and apparently net 144 is a regression of net 140… So not only have we not been getting new networks, apparently “drift” is at play and transititivity no longer holds, ergo instead of hitting target goal of under 55 visits, progress seemed to have stalled and went backwards. On top of that, for the very first time, a bjyixo 20b net actually performed WORSE than a current official net when matched up, ie with only a winrate of 41.83% (FAIL) even though the newly created 20b actually contained the ELF games! Not good! Previously recall each time the new 20blocks would outperform the current networks, sometimes even on time parity… this time it straight up flunked and got an F-. Bad omens all around. Speaking of which, for the first time ever LZ lost a game to Haylee from total of six games already played, and it lost badly, very badly… playing moves that went from bad to 50kyu style of horrible. Things are not looking good folks.

This is a major hit to computer Go. Hope things start turning around soon.


#69

I don’t think the reason LZ lost from Haylee is because it has gotten worse, but that it is not optimised for playing handicap games. Next to that, 3 stones handicap is absolutely huge for a pro player. I don’t believe that there are many professional players who could easily win from another prof with 3 stones handicap.

The reason LZ played badly, is that its winrate was below 5% for most of the game, which means that there is almost no margin to distinguish a bad move from a good move. The program doesn’t know how to play with confidence or how to overplay (in an inconsipcuous manner). If it was up to LZ, she would have resigned as soon as she was allowed to.

You see similar behaviour in any bot that you force to keep playing an impossible to win game.


#70

I like your “this is the end of the world” writing style.

My guess is that Leela got bitten by ELF. We don’t know history of ELF weights and instead of just doing our thing, decided to use games from ELF. And now Leela changed her joseki. Basically we want Leela to be ELF and it’s not working, naturally. Leela is going to turn into unstable mess.


#71

From the stats here it appears it may be not all related to the recent ELF injections as the data seem to suggest that LZ has been shedding its repertoire for quite some time ongoing now, esp for certain josekis.

http://zero.sjeng.org/opening

It seems facebook already maxed out ELF at 20 blocks. So I can’t imagine how doing a 25% to 50% mixture on 15 block will ever get LZ as strong as ELF… upping the blocks and adding more layerz was the only realistically hope for moving forward but today’s 20block test put real dampers on that too… many people expected with the addition of the ELF data that the newly trained 20 blocks would be much stronger, hoping to be stronger on time parity so as to allow a switch to the larger size officially… but it didn’t even pass in regular matches. But make no msitake, staying at 15block there is no chance it can ever catch up to ELF, nevermind suprass it.

But as for the handicap game that it lost to Haylee, well for all intents and purposes from the perspective of exponential weakness of AI giving each incremental handicap stone, it was a 3 stone handicap (3H) from the viewpoint that with or without komi didn’t even matter and wasn’t a lasting point of contention as it never even went out the gate and wr for LZ never took above 10 percent (default resign threshold)

Yet baduk1 was able to get a decisive victory in a similiar “3H” situation against another similiar pro … the only things I can think of are that he started black not on the starpoints (hencing making it a bit easier on LZ) and used twice as much horsepower as the match that was held against Haylee. Since the exact same net 144 was used, these were the only possible differences that could have made a difference.

I’m sure had 8xV100 been used instead of a single gtx1080ti, that it would likely have made a big difference. For 3H level, its imperative to grab the upper hand as soon as possible and basically if LZ can’t turn the tide before middle game, it essentially never going to happen as becomes a lost cause and forgone conclusion and then must proceeds to melt down from there… there is an a small early window of oppurtunity that LZ must grasp in order to have any hope or chance. Having subpar hardware surely didn’t help its chances (recall AlphaGo used hundreds of GPU with Fan Hui on even, and later 4TPU with Lee and Ke on even, and Golaxy used 10 x Gtx1080Ti and even LZ’s first match with Haylee used 4x 1080, so for a 3H using just a single gtx1080 was basically asking a sports car to win a race on only one cylinder and it had to haul an 18 wheeler behind it )

I do concur that anything above 3H will require something new strategy like a retraining of dual network heads etc and that beyond 3H, currently with LZ networks, no amount of hardware will help and it will be exponentially hopeless. But I’m of the opinion that 3H could have been won had the best publically available (current 8xV100 on aws) been used as opposed to a single gtx1080ti. Contrast this with game 2 of 8 in which 4xV100 was used against Haylee with LZ on even, so basically the horsepower has been reduced by factor of EIGHT (each V100 is roughly twice the perf of gtx1080Ti) AND given THREE extra handicap stones to start… if that ain’t a stark illogical juxaposition I’m not sure what is. Thats like saying your 800 horsepower car only has 100 horsepower now and then on top of that you are expected to tow three more cars behind it… no doubt it will meltdown catastrophcally

For high handicap, despite what some folks such as dorus has opinioned on github, I assert that a brand new strategy must be needed. (4H or above) because even a perfectly strong engine, something stronger than AGZ, will not be able to win conventionally simply because not that it isn’t strong enough, but it anticpiates itself to play moves that are indeed so strong that it evaluates all possible moves to be zero/hopeless and it is that ‘horizon’ effect that messes with its DCNN that blinds it and causes the meltdown, not that it isn’t strong enough to beat a pro on 4H etc… so this has to be remedied by some new method that currently doesn’t exists. Just simply “business as usual, play as normal” ain’t going to cut it at higher handi levels no matter how strong the network is… something new must be needed. There is no doubt on this.


#72

But the lack of “horsepower” does not explain the moves that Leela played (like M12 or L2 or T5). Surely one of the problems is that Leela had to search through too many branches, as she was probably very confused due to the low score, but a larger problem is that her search was as broad to begin with. I don’t think my version of Leela would ever consider to play those moves even with about 5000 playouts, as long as Leela thinks she has a chance.


#73

I get what you are saying and don’t dispute that past a certain point even infinite horsepower won’t be able to help, again, there is a very small window of oppurtunity to bend the winds in favor at the early onset of the game that when/if missed, is never going to be gained back later on. The nonsensical moves that we saw would never have been made had LZ had enough power to overcome early on and get the decisive advantage to bend the odds back into its favor, once it already went into meltdown mode then of course nothing would be able to help it recover from that point on. Its all about not allowing LZ to slip into that position, and my contention is that had 8xV100 been used, it would have won game 5.

Otherwise if moving from 2H to 3H was already into the terrority of “impossible” despite hardware used, then how does one explain Baduk1’s good result?


#74

Another interesting thing to consider, is that there is an upper bound to how good you can play, even for AI (as a game with complete knowledge for both players, there has to be a perfect strategy). Roughly said, with 100 handicap stones you can 100% of time win the game, whatever your opponent might try. We don’t know about any perfect strategies for low handicaps, but that doesn’t mean that they don’t exist and that you can always keep getting better.

I wonder where that boundary is. How much smarter do the AI need to get to be able to win against the perfect player with a certain number of handicap stones? Or dually one could ask, how much handicap do we need to give AI now to let them win against a perfect player. Sadly we don’t have any perfect players…

As well as the question if the current deep learning approach might ever be strong enough for such a thing as perfect play.


#75

Speaking of upper bounds, even using the same zero technique its different for different games. For example there is been speculation that the upper bound of Chess is far lower than that of Go. and may have been attributed to the reason why deepmind only spent 4 hours training it whereas it spend 40 days training Go, and it did a less than scientific and less than satisifactory showcase of it compared to its unquestionable strength when it came to go: https://groups.google.com/forum/#!topic/fishcooking/ExSnY8xy7sY

Based on ELF’s development time, my guess is that ELF already reached it max potential for the 20 blocks and already hit its “upper bound”, which means that even if LZ moves to 20 blocks, it can’t really get stronger than what ELF is at right now. without going to larger sized.

As for strength in general, there is a point in which past that # of stones even a “perfect” play won’t be able to beat the lowest average rank of players… think of it like a tree branch all enumerated out… like in a much larger universe in which all of Go was deterministically printed out and one merely branched into winning or losing positions, at some point no matter what, even with the answers right in front of you, the tides won’t be able to be turned.

But 3H and 4H and even 5H should be easily doable.


#76

Wouldn’t the opposite be true? All of the strangest moves had a very low network prior and were only found after thousands of playouts. For example, the M12 blunder had a prior of (N: 0.18%). The “best” move from a human perspective, D14, had a prior of (N: 26.10%) and was preferred by LZ from 1 visit all the way to 50,324 visits. It was only at 51,482 visits that the blunder was chosen, if I read the log correctly.

http://termbin.com/1qzw

Here is the mechanism why low playouts may be better in handicap games: the network is extremely strong. At low playouts, the network move is likelier to be chosen. At high playouts, the network preferences are scanned exhaustively and found (correctly) to not lead to a win (assuming best play by opponent). So the network begins looking at blunders for that elusive win and has trouble at that point.


#77

It would not be difficult to configure a high handi version of LZ to start out using predominatly its network and giving much less weight to playouts and smoothly transiiton back to normal as the number of moves and its own wr% starts to gain, and this would maybe work against average human players but against a pro or very strong player I’m not sure how regressing to what would still be a weaker state (at 100 playouts its bascailly back to Leela 11 status) would help it turn the tides in the beginning esp since it starts off at a huge disadvantage against a pro, so its catch 22 situation at best. Plus not to mention it would fall for all kinds of easy ladders early on.

Also if the opposite were true, then baduk1 using twice horsepower should have performanced worse than LZ did during haylee game 5, but it scored a decisive victory.


#78

Yeah, this may be a lasting issue with the current LZ architecture. LZ was not trained on or for handicap games. Using your car analogy, LZ playing handicap is like a Tesla carrying a trailer. It’s an otherwise great AI doing something it was not designed to do.

There is an issue on github about this very dilemma. I have not contributed to it because I am not a programmer and have nothing to contribute other than goodwill and autogtp on a macbook. :slight_smile:


#79

Assuming this Tesla is programmed to drive off a cliff when it’s almost out of battery power.


#80

Tesla uses Nvidia GPU for its NN for self driving and recently there was one that did drive off a cliff, and one that drove into a highway divider and blew itself up LZ style

But that is perhaps where the analogy ends, because telsa actually can carry trailer quite well


#81

Seems like at least 4xV100 will be used next round. https://old.reddit.com/r/cbaduk/comments/8ml3le/haylees_leela_zero_match_game_58_is_now_up_also/dzq4cmh/

If LZ takes another beating, maybe can they change the rules and go back to playing even games.
I’d like to see Haylee vs LZ with 8xV100 or more on even would be super interesting since it should already be stronger than AlphaGo Lee in that configuration.