KataGo speeds of different hardwares

RobertJasiek · July 7, 2023, 4:34pm

Here are some rounded KataGo speeds in visits/s on my RTX 4070 for different libraries in comparison to other people’s older hardware (which sometimes measure playouts). I use

RTX 4070 (Asus TUF 12G, Quiet mode, 200W TDP, 100% power target)
Ryzen 7700 (8C, 16T)
64 GB DDR5-RAM JEDEC
Nvidia Studio driver 531.61
KataGo 1_13_0 (OpenCL, CUDA) or 1_13_1 (TensorRT)
18-Block-Model = kata1-b18c384nbt-s6386600960-d3368371862
default values for RAM cache = 3GB and time = 5s
Nvidia libraries CUDA_11_6_2 + CUDNN_8_9_1_23 + TensorRT_8_5_2_2.

Note that a good versus bad combination of Nvidia libraries can cause a 5.3x speed difference.

Speed   Hardware

6500    RTX 4070 TensorRT, 140,000 visits, 80 threads (recommended)

4000    RTX 4070 CUDA, 100,000 visits, 64 threads (recommended)

3000    2 * RTX 2080TI, CUDA, b40, 64GB, 100000 visits, 1s, 40 threads (recommended) ~= 2800 visits/s, 80 threads ~= 3000 visits/s

2200    RTX 4070 OpenCL, 100,000 visits, 40 threads threads (recommended)

0580    5700XT, b40, 12GB, 16 threads

0300    iPad_Pro/M1

0200    iPhone 13 pro, b40

0170    iPad/A12X

Where do your RTX 1000, RTX 3000, RTX 4000, RTX Laptop cards and Macs fit?

1weigo · March 4, 2024, 1:48pm

Hi,
Satisfied with your GPU?
I want to buy a new PC and want to use it to analyse my games.
I’d like a quick analyse.
Regards

RobertJasiek · March 4, 2024, 5:16pm

Yes. As you might know, it is an RTX 4070 Asus TUF (non-OC) desktop card, which I cannot hear relative to the other fans. Every 4 months, the fans should be cleaned with a paintbrush.

The GPU is fast enough for serious study. Of course, 4090 would be 2.1 ~ 2.2 times faster but costs much more, consumes ca. 2 times the power, can become noisy and involves risks related to the power plug. 4070 (or now maybe 4070 Super or a similar card) is a reasonable choice as to speed, cost, power bill and longevity.

For me, 4060TI would be too slow while 4070 is just good enough, but your needs might differ.

It is essential to use Nvidia TensorRT libraries and tune. See my manual for installation. Otherwise, you might lose much speed.

A 4070 pairs very well with an 8C/16T CPU, such as Ryzen 7700, and 64GB RAM if you want very serious study for a few hours of search. In theory, 16GB might work but is no fun because there is always the danger of insuffcient RAM. 32GB should be the minimum RAM. If you consider more than 64GB, 128GB makes no sense - 256GB would be the next meaningful upgrade for the rarest query once every three months. I really see no reason for more than 64GB yet.

If you get 4090, a 16C/32T CPU might make some sense. For my 4070, usually my CPU is fast enough. For infrequently occurring many reversions, firstly higher single thread speed and secondly more cores would help a bit but is luxury. Rather, I am very pleased with how balanced my choice of GPU, CPU and RAM is for research-like search.

For Katago inference, I have not even reached 1GB VRAM yet. So the 12GB VRAM of my GPU are a total overkill as long as Katago is the only application. Only if you train your own neural net locally, you cannot get enough VRAM.

Avoid my mistake: do not overpay on overclockable RAM. Just get a good pair of JEDEC sticks.

The most importantly: protect your GPU by stands! Do not believe any of those PC builder Youtubers inserting a GPU as if it weighed nothing. PCI cards were meant to be 50g or 100g but current GPUs weigh over 1kg.

There are different study modes: playing, analysing and - easily overlooked - permanent pondering with user interaction.

1weigo · March 4, 2024, 7:25pm

Thank you very for this detailed answer

Feijoa · March 4, 2024, 8:19pm

How much speedup do you actually get from this? Also, could you provide a link to the manual you referred to?

I was looking into the CUDA and TensorRT options but couldn’t understand the installation process on Windows and am wondering if it is worth it.

RobertJasiek · March 4, 2024, 9:45pm

Accelerations depend on several factors but can be huge. OpenCL without tuning versus TensorRT with tuning has been the speed factor 14.43 for me! See

AI Computer section Factors Comparing Different Speeds

Invest some days or weeks on proper installation and tuning! It is absolutely worth the effort! Reading my manual carefully should make it easier for you than the almost 2 months of installation I needed.

Note that newer Katago versions might work with CUDA 12 rather than the 11 I still use.