What is the strength of KataGo?

hexahedron · June 14, 2024, 12:35pm

KataGo using a relatively modern network and using the absolute minimum possible amount of computing power with that network - 1 visit - has in the past been able to maintain 8d on KGS when someone ran it there. So, if you disregard situations very specially engineered to be exploits or blindspots (like adversarial attacks would find) and consider just “general purpose” play, KataGo is always at least the very top amateur dan, and probably gets into superhuman rapidly once you have even a little bit of search.

People perceive it often to be weaker than that at low playouts providing analysis, but I think this comes down to a big mix of things.

The amount of compute power to accurately and non-noisily quantify the best move is often easily 5x-10x larger than the amount of compute required to usually just choose the best move or a good enough move.
Yet more compute needed if you also want accurate judgment of multiple alternative moves. When playing, you can easily avoid a bad move even when very wrong about how bad it is because you haven’t spent much time thinking about it, so long as you still are “sus” enough of the move out of pure instinct to not play it. Same for bots.
People’s expectations are, quite reasonably, very high when asking for analysis that they want to be able to trust, and higher than what it would take to merely play a given level, although they don’t always realize this. (I’ve seen this before when working with AI for other games too!)
- Perhaps some of this is due to people having higher standards of consistency for analysis than play. We reasonably want the analysis on every move to be accurate to a given level and will hone in on any point where it looks like interesting things are happening. And to almost never make a mistake of a given “level” requires being vastly stronger than that level.
- Perhaps some of this is that as soon as actual numbers get displayed, it’s easy to presume a level of precision, and so you need another big multiplier on compute power to actually reach the precision that people presume that numbers have.
  - (Minor note: people usually interpret a number like -2 as some objective claim like “the bot says the player is behind by 2 points” but the interpretation should usually be “if you give the player 2 points for free, the bot is not sure which side it prefers”).
Bots at a given level have a hugely different strength/weakness balance than humans of a given level. A mid-pro-level bot will make many mistakes that even amateur dan players can perceive, because it’s compensated by places where it is massively beyond human pro level in certain areas of instinct and holistic direction of play. A bot’s overall “play” strength needs to be much higher before its weakest spots are also are beyond what people can usually perceive.

Not applicable to OGS, but also there’s the detail where Lizzie github has been dormant for a few years, so if you download Lizzie it comes packaged with a very old network that is still very strong in absolute terms but also much weaker than any modern nets, and a user might not realize this.