Benchmark Go software

drbeco · January 22, 2020, 11:34pm

Hi guys,

Because we had this discussion here, I decided to benchmark two of the engines I have installed.

I wonder how those bellow compare with your own, running in your preferred personal computer or laptop.

Mine is running in an intel i7-8550U with 8 cores and 20 GB RAM, plus a 3-core GPU NVidia MX150 with 4GB.

The engines bellow are called Leela and LeelaZero.

One of the questions on the other topic was if they were similar. From what I could read, both designed by the same author, belgium programmer Gian-Carlo Pascutto, and “both” uses Deep Learning and Neural Networks; so from all I can tell, they are just different versions of the same software, with small improvements in the algorithm itself (as versions grow), and some big improvements on the network matrix weights fed into them.

While Leela is fixed to use the network weights that comes with it, LeelaZero allows me to download a different file and use the best current network available. You can download the best current network weights here: https://zero.sjeng.org/best-network

I’m still not sure if Leela cannot use different weights, since the text-interface of the program is the same, and LeelaZero has a default to look into the directory ~/.local/share/leela-zero/best-network, so I tried to use it to see if I could get some messages on the log. It seems to ignore the file, but it always outputs: Transferring weights to GPU...done, so it clearly have weights embedded.

So, here both STDIN output when run without the GUI, directly from a terminal. If you want to test yours, please run from terminal and when the “leela:” prompt appear, type the two commands that runs the benchmark:

benchmark : tests the speed of the algorithm without making use of the network. (Doesn’t work on LeelaZero)
netbench : tests the speed of the algorithm using the network.

Here the results

Leela 0.11.0, by Gian-Carlo Pascutto (GCP)

GUI: LeelaGUI Sjeng by GCP
Engine: leela_gtp_opencl, by GCP
Released: 12 Oct 2017
Link: https://sjeng.org/dl/leela_0.11.0_amd64.deb
Features:
- Strong Go engine including support for multiple processors and GPU acceleration
- 2008 Computer Olympiad Silver (9x9) and Bronze (19x19) medalist
- Featuring Deep Learning technology
- Easy to use graphical interface
- Adjustable board size (up to 25x25!), playing levels, handicap and komi
- Fixed strength and time based difficulty levels
- Chinese rules with positional superko
- Leela is an older and weaker program, not open source, by the same author trained on high level games. By senseis website
- As of May 2016, the website states that Leela plays at the 4 dan level on a 19x19 board, and at a high dan level on a 9x9 board. (By senseis website).
- At 9x9, her strength is around 1 dan (by senseis website).
- Strength over 9 dan on 19 x 19, depending on hardware (by Sjeng website - not updated)

% ./leela_gtp_opencl # [2020-01-22 18:32:45 !4206]
110 feature weights loaded, 2608 patterns
Initializing OpenCL
Detected 1 OpenCL platforms
Platform version: OpenCL 1.2 CUDA 10.1.120
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce MX150
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 430.34
Device speed:  1531 MHz
Device cores:  3 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce MX150
with OpenCL 1.2 capability
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64 
Transferring weights to GPU...done
OpenCL self-test: passed.

Passes: 0            Black (X) Prisoners: 0
Black (X) to move    White (O) Prisoners: 0

   a b c d e f g h j k l m n o p q r s t 
19 . . . . . . . . . . . . . . . . . . . 19
18 . . . . . . . . . . . . . . . . . . . 18
17 . . . . . . . . . . . . . . . . . . . 17
16 . . . + . . . . . + . . . . . + . . . 16
15 . . . . . . . . . . . . . . . . . . . 15
14 . . . . . . . . . . . . . . . . . . . 14
13 . . . . . . . . . . . . . . . . . . . 13
12 . . . . . . . . . . . . . . . . . . . 12
11 . . . . . . . . . . . . . . . . . . . 11
10 . . . + . . . . . + . . . . . + . . . 10
 9 . . . . . . . . . . . . . . . . . . .  9
 8 . . . . . . . . . . . . . . . . . . .  8
 7 . . . . . . . . . . . . . . . . . . .  7
 6 . . . . . . . . . . . . . . . . . . .  6
 5 . . . . . . . . . . . . . . . . . . .  5
 4 . . . + . . . . . + . . . . . + . . .  4
 3 . . . . . . . . . . . . . . . . . . .  3
 2 . . . . . . . . . . . . . . . . . . .  2
 1 . . . . . . . . . . . . . . . . . . .  1
   a b c d e f g h j k l m n o p q r s t 

Hash: D3D4AF73792A4C5C Ko-Hash: CDEF4E01EF8C42AC

Black time: 00:30:00
White time: 00:30:00

Leela: benchmark
200000 games in 17.16 seconds -> 11655 g/s (1456 g/s per thread)
Avg Len: 418.65 Score: -2.483845

Leela: netbench
 2000 predictions in 22.28 seconds -> 89 p/s
10000 evaluations in 12.82 seconds -> 780 p/s
= 

Leela: quit
=

Leela Zero, by Gian-Carlo Pascutto (GCP)

GUI: None
Engine: Leela Zero 0.17 + AutoGTP v18, by GCP and contributors
Released: 4 Apr 2019
Link: https://github.com/leela-zero/leela-zero/releases/tag/v0.17
Features:
- Strong Go engine including support for multiple processors and GPU acceleration
- Project attempting to replicate the approach of AlphaGo Zero
- Implemented batching for GPUs, increasing speed.
- A Go program with no human provided knowledge.
- Using MCTS (but without Monte Carlo playouts)
- Deep residual convolutional neural network stack.
- Leela Zero is not meant to be used directly. You need a graphical interface for it, which will interface with Leela Zero through the GTP protocol, like Lizzie or Sabaki.
- Leela Zero finished third at the BerryGenomics Cup World AI Go Tournament in Fuzhou, China on 28 April 2018.
- The New Yorker at the end of 2018 characterized Leela and Leela Zero as “the world’s most successful open-source Go engines”.
- With almost 20.000.000 games played against itself so far, since version 0.1 (Oct 2017), it has a current ELO approximated to 15681
- It played approx. 12000 games in the last 24 hours. (22 Jan 2020)
- The best network hash today is 3819f2a1 (codename LZ259, from 19/Jan/2020 16:33:03)
- It has reached superhuman strength. by Senseis website

% ./leelaz       #    [2020-01-22 18:40:52 !4206]
Using OpenCL batch size of 5
Using 10 thread(s).
RNG seed: 4079330378111192179
Leela Zero 0.17  Copyright (C) 2017-2019  Gian-Carlo Pascutto and contributors
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see the COPYING file for details.

BLAS Core: built-in Eigen 3.3.7 library.
Detecting residual layers...v1...256 channels...40 blocks.
Initializing OpenCL (autodetecting precision).
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 10.1.120
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce MX150
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 430.34
Device speed:  1531 MHz
Device cores:  3 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce MX150
with OpenCL 1.2 capability.
Half precision compute support: No.
Tensor Core support: No.
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 CUDA 10.1.120
Platform profile: FULL_PROFILE
Platform name:    NVIDIA CUDA
Platform vendor:  NVIDIA Corporation
Device ID:     0
Device name:   GeForce MX150
Device type:   GPU
Device vendor: NVIDIA Corporation
Device driver: 430.34
Device speed:  1531 MHz
Device cores:  3 CU
Device score:  1112
Selected platform: NVIDIA CUDA
Selected device: GeForce MX150
with OpenCL 1.2 capability.
Half precision compute support: No.
Tensor Core support: No.

Started OpenCL SGEMM tuner.
Will try 290 valid configurations.
(1/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=2 VWN=2 1.1011 ms (535.7 GFLOPS)
(6/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=16 NDIMC=16 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.9618 ms (613.3 GFLOPS)
(14/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.9226 ms (639.3 GFLOPS)
(19/290) KWG=32 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.8909 ms (662.1 GFLOPS)
(23/290) KWG=16 KWI=8 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=2 VWN=2 0.8722 ms (676.3 GFLOPS)
(37/290) KWG=32 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.7268 ms (811.6 GFLOPS)
(78/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.7117 ms (828.8 GFLOPS)
(172/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.7035 ms (838.4 GFLOPS)
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64 

Started OpenCL SGEMM tuner.
Will try 290 valid configurations.
(1/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=2 VWN=2 0.8008 ms (736.6 GFLOPS)
(13/290) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=2 0.6710 ms (879.1 GFLOPS)
(198/290) KWG=32 KWI=8 MDIMA=8 MDIMC=8 MWG=64 NDIMB=8 NDIMC=8 NWG=64 SA=1 SB=1 STRM=0 STRN=0 TCE=0 VWM=4 VWN=4 0.6630 ms (889.6 GFLOPS)
Wavefront/Warp size: 32
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 64 
Using OpenCL half precision (at least 5% faster than single).
Setting max tree size to 3736 MiB and cache size to 415 MiB.

Passes: 0            Black (X) Prisoners: 0
Black (X) to move    White (O) Prisoners: 0

   a b c d e f g h j k l m n o p q r s t 
19 . . . . . . . . . . . . . . . . . . . 19
18 . . . . . . . . . . . . . . . . . . . 18
17 . . . . . . . . . . . . . . . . . . . 17
16 . . . + . . . . . + . . . . . + . . . 16
15 . . . . . . . . . . . . . . . . . . . 15
14 . . . . . . . . . . . . . . . . . . . 14
13 . . . . . . . . . . . . . . . . . . . 13
12 . . . . . . . . . . . . . . . . . . . 12
11 . . . . . . . . . . . . . . . . . . . 11
10 . . . + . . . . . + . . . . . + . . . 10
 9 . . . . . . . . . . . . . . . . . . .  9
 8 . . . . . . . . . . . . . . . . . . .  8
 7 . . . . . . . . . . . . . . . . . . .  7
 6 . . . . . . . . . . . . . . . . . . .  6
 5 . . . . . . . . . . . . . . . . . . .  5
 4 . . . + . . . . . + . . . . . + . . .  4
 3 . . . . . . . . . . . . . . . . . . .  3
 2 . . . . . . . . . . . . . . . . . . .  2
 1 . . . . . . . . . . . . . . . . . . .  1
   a b c d e f g h j k l m n o p q r s t 

Hash: 9A930BE1616C538E Ko-Hash: A14C933E7669946D

Black time: 01:00:00
White time: 01:00:00

Leela: benchmark
? unknown command

Leela: netbench
 1600 evaluations in 21.95 seconds -> 72 n/s
= 

Leela: quit    
=

Conclusion:

Given that we don’t have what network version Leela software uses embedded, but we do have the file date when it was build, we could infer that it uses one of the first weights ever trained compiled together with the code to generate a binary for then engine, and it comes with another separated binary for the GUI.

I can’t rely on personal experience to evaluate the strenght of the programs, since I am too new to Go to be able to “feel” it, but based on the research I did, both programs are very strong compared to humans, and yet, LeelaZero is amazingly even stronger when compared with each other.

What is your benchmark?

Can you try your computer and give some feedback on the benchmark. Also, if you have experience with those programs or others, what is the strengh you believe they have?

yebellz · January 23, 2020, 2:13am

It seems that your benchmark is focusing on some sort of speed measurements. I’m guessing position evaluation rates? However, speed is not the same thing as playing strength.

What are you trying to show?

I believe Leela and Leela Zero work in very different ways, and it would be misleading to compare them in some sort of speed benchmark. Leela seems to evaluate positions much faster, but the Leela Zero positional evaluations should be of a much higher quality, which leads to much stronger play, since it can do more with fewer high quality evaluations.

drbeco · January 23, 2020, 4:58am

Hi Yebellz,

I guess my post shows too much focus on speed, yes. But that is just because I am very curious about this item. And because the command “benchmark” only test for that.

Don’t let this single item obscure the other points I made. It took me a while to gather all that information besides speed.

There are a lot of other things, features and opinions on strength from other sites. Also you can see that both algorithms loads weights for networks and patterns. Both use deep learning. They are basically both connectionist approaches, very different from the old classical AI.

The code is closed for Leela, so I can’t say for sure, but I believe it is just the authors first attempt to start LeelaZero.

I guess two points:

For this thread, I’m just curious how those different algorithms have different strenght. We could compare other classical algorithms as well, as with more help and collaboration, maybe list and make sense of a bit of history and how we got were we are, since the first old attempts and how weak they were.

For the other thread, we were discussing that Leela is too weak or very different. My point in that thread is that they are not so different, and considering the “dan” level reported in those sites I pointed out above (in the feature item for each), and competitions they won, and all (besides speed, because that is hardware specific and one could argue that my machine is what makes it weak), they are really both strong, one being of course much more strong than the other, but both being a lot stronger than many (most?) humans.

AdamR · January 23, 2020, 6:38am

The main difference is that the original Leela comes with many pre-programmed human ideas. Josekis and other sequences/patterns…

LellaZero learned only by self play, following the release of AlphaGo Zero paper.

That said I am not sure there is much sense in benchmarking speed of the two programms - or rather a speed of what exactly? Leela might give you more evaluations or whatever in a given time I guess, but LZ move is likely to be better even though it managed less evaluations in the given time… Not sure what we can conclude from that comparison.

And to prevent potential confusion - in the other thread I claimed Leela (not Leela Zero) is pretty rubbish on 9x9. Anyone around my level will be a match for Leela (at least the latest version) on 9x9, which is a lot of players.

On 19x19 leela has long been above most amateur players, but never really achieved a pro level as far as I am aware. While Leela Zero has “above human” strength.