I just bought a new computer with a GPU AMD Radeon RX9060XT, and was hyped to finally be able to run go AI at full speed.
My first reflex was to use Katrain, but the results were disappoiting: it is quite slow, getting like 5 to 10 seconds to reach 1k evaluations. No error messages, and the GPU is actually being used intensively if I check the task manager, but it didn’t seem to match the results.
I tried Lizzie, and there it is much faster: with LeelaZero engine it climbs very quickly (a few seconds) in the 10k, 20k, and with Katago engine it is slower (about 2 seconds to reach 1k) but still a lot better than Katrain.
Is this normal ? Have you had similar issues ?
if Leela neural net is smaller than Kata neural net, you will get more Leela playouts at the same time. But Kata still may be stronger. Did you try to create game of Leela vs Kata with same time/move setting? Who wins?
Playout doesn’t equal strength, LZ had stopped self-training for years, and no matter how many playouts they would get beaten by the current strongest Katago network by a long shot, with very low playouts. Smaller and older networks models simply won’t cut it
Here is a game between the latest Katago (model kata1-b28c512nbt-adam-s11165M-d5387M) vs the last version of LeelaZero (model 0e9ea880), with Katago using just 1600 playouts, while LZ has 32000 playouts (and LZ would even take longer in real time, Katago with 1.6k takes like no time, but LZ 32k would take at least a few seconds). And you can see LZ still has the ladder problem.
Also, AMD gpus only support OpenCL library, and TensorRT backend likely has at least 50%+ performance boost for the same Nvidia gpu range (5090ti maybe?).
I just bought a new computer too these days, chosed Tesla V100 as GPU.It can reach 200-500 visits/s using opencl while 1200-2500 visits/s using tensor.Tensor core version can bring a significant improvement comparing with opencl version.
Im using a ‘b28c512‘ network while a ‘b40c256‘ network could reach 3000-9000 visits/s with a worse GO performence