Potential rank inflation on OGS or How To Beat KataGo With One Simple Trick (free)!

You guys are ahead of me because I didn’t read the entire discussion. But as far as I understand we talk about humans beating a low readout (dumbed down) version of Katago. And they call that a blind spot.

Nope. I don’t buy that argument.
You can read in every Katago-Micro game chat, that Katago-micro can and will blunder, because of its limitations. So how is this news?

Holon, I suggest you learn about things before you get angry based on your incorrect assumptions about them.

2 Likes

You are probably right. I am not angry though. Just very confused.

Perhaps look at it like this:

AltGo is the game of Go with the same rules, except that it has a different capturing rule: a group that has no liberties will be captured and removed from the board just like in Go, except that in AltGo it will not be captured if the group without liberties itself surrounds a living group.

Now, when training KataGo to play Go, and AltKataGo to play AltGo, either version becomes extremely strong, but since the situation mentioned above is ultimately so rare, it doesn’t come up in the training games at all. Essentially the (Alt)KataGo network has virtually no idea about what happens in the particular situation that distinguishes AltGo from Go.

In this context, it is not surprising that KataGo thinks it’s playing AltGo or that AltKataGo thinks it plays Go, and thus that the network makes huge blunders based on not recognising the situation.


What the second paper shows, is that such situations can be found (with relative ease) using adversarial networks: networks that are specifically trained to find the flaws and maximise the ability to exploit them.

It shows that despite being superhumanly good at usual games of Go, the network can contain flaws that are easy to exploit and would not be confusing to average human players, hence they can be exploited by humans in particular.

Outside of Go this is a very good warning message: although AI (or at least the kind that is being used right now) may seem brilliant at completing a task in comparison to humans, there may always be critical points where the AI may do a completely stupid action that no human would ever do. This can lead to huge security risks, dangerous situations and gives malicious parties a chance to use AI to their benefit.

It also begs the question in what capacity the AI actually understands Go, if it can make mistakes like this. Compare that to the capacity in which, for example, ChatGPT understands what it is writing.

7 Likes

Is this an AlphaGo versus Lee Sedol moment kinda? A human plays a move that totally makes sense to humans and in the game, but the AI didn’t even consider that move?
Only in this case, an AI was needed to find that blind spot.

It might be a combination of Lee Sedol being extremely good at Go himself and playing a style that AlphaGo indeed was not very familiar with. Note that the moves that Lee Sedol played were surprising to the pro’s commentating the game as well, not just to AlphaGo.

But the thing with this flaw in KataGo is that it doesn’t require to be extremely good at Go to pull it off: any average kyu player could win against KataGo with this strategy.

5 Likes

That’s wild. So the entire discussion how “smart” AIs really are, just got a new spin.

1 Like

In discussions about large language models in such, one thing that often comes up is to distinguish between the agent or personality or author that the model is simulating at a given moment, and the model itself which is simply trying to make the best prediction of the text that follows. Sometimes, a model might actually “know” the correct answer to a question, but give the wrong answer or answer in a different way than intended because of how it’s asked, particularly models that aren’t finetuned to be helpful chat agents. E.g. a simple common sense question like “Why does an X weigh more than a Y” for some choices of X and Y, could sound like the lead-in to a comedic joke, prompting the model to follow with a joke answer because absent any further context if you had seen this text randomly on the internet, that would in fact be the most likely kind of answer to follow, rather than an accurate or scientific answer. A lot of prompt engineering is about managing this kind of phenomenon. The model is not trying to output the correct answer even if it “knows” or could easily know or produce the correct answer, it’s trying to predict the answer that would most likely occur in the training data.

In following KataGo’s learning on cyclic groups, I’ve gotten the distinct impression that this phenomenon is happening too. The model is not trying to predict the correct move and correct evaluation, it’s trying to predict the moves and values in the training data. For example, you can see that the model sometimes separately has to learn that a group in a given situation is alive at komi 7, and that the same group in the same situation is alive at komi -7 (i.e. you are playing as Black instead of White), and might actually predict the group as dead in one case and alive in the other.

Why? Well, maybe at komi -7 you would be losing in this kind of position, so earlier in the run, selfplay had incentive to try more risky moves in the hopes of turning the game around, and found the tricky tesuji for making the group live. Whereas at komi 7, selfplay consistently won without trying to make the group live because, while already winning, it had little incentive to look for risky tesuji that could lose the game if it fails.

And after enough games, the tesuji is so consistently found at komi -7 that the model is correctly very confident the group should live, while at komi 7 the selfplay games have still gotten unlucky and never happened to find the move, so the model is very confident in predicting that the group will die.

I really want to say to the net “are you serious? You already have a circuit in you that recognizes this shape! Just activate that circuit in the komi 7 games too!”. But doing so would not be correct from the model’s perspective. Even if it would be easy for the model to output the correct result, even if that would only require tweaking a few weights to activate that circuit and even if doing so would be a strict improvement in quality of play with no negative side effects, the model’s job is to predict the training data, not to play correctly. And the training data currently shows this group consistently dying at komi 7, and that might not change until at least one game closer to komi 7 also randomly discovers the same move due to noise and exploration and starts having the group not die.

The above story is one in which this only affects the score achieved, not the win/loss result, but there are also different permutations of this story where win/loss is affected too. And to be clear, a lot of permutations of this story probably also don’t involve komi, so it’s not komi in particular that’s a problem, this is just perhaps one of the easier to explain examples.

13 Likes

I disagree about the first paper, though that was my first impression too. The first paper was interesting because it used the adversarial technique to exploit a flaw in KataGo without humans pointing the way. That is interesting as an approach to finding other flaws in NN-based AIs, and lead to the second exploit. The capture problem was a flaw in KataGo, because KataGo was intended to be able to play well by strict Tromp-Taylor rules, and that meant it should have captured everything.

1 Like

Yes, that’s what the reporting/marketing of the result was, but that’s not quite right either, it’s misleading to state the result that way. What they showed was a weakness in KataGo’s raw policy, i.e. “1-playout” KataGo, and/or very low playout search. But KataGo uses MCTS, and typically many playouts, for a reason. I think it’s much more likely than not that with more work you still could adversarially engineer positions to be extreme enough so that it is still some problem then, but that’s not what was achieved.

Just like if someone were to find a method to win against Stockfish in chess that only worked when it was restricted to depth 5 or a maximum of some milli/microseconds of thinking time, it would be deceptive for me to unqualifiedly claim the method could “beat Stockfish”. And it would still be deceptive even if on a technical level a report accurately disclosed those parameters, if it used language that would make it easy for readers to overlook over that fact and encourage reporters to publish articles lacking mention of that detail.

Slightly complicating things :sweat_smile: was the fact that at the time, KataGo also had a couple of bugs or issues in the code that made it so that bad raw policy could cause blunders in low playout searches to take several times more playouts to fix than it should have - this part of course was my fault in coding KataGo, not theirs. Those have since been patched.

It is and was already well known among experienced users not to trust the policy or low-to-mid-playout MCTS. For example, here’s the raw policy of a 40b net from near the end of last year, on one of the less-fortunate samples from the net:

The unique correct move is P14 (bP14, wO15, bN14 drive white in a loose ladder towards the top), which only has 12.4% while a losing move holds the top spot with 51.4%. This is obviously more complex than passing with dead stones, but on the other hand, it’s still amateur level and I also didn’t have to try very hard or even do any adversarial exploitation to find it, I just had to look at some positions. The raw policy simply isn’t robust at all, and there isn’t a strong theoretical reason to expect it to be (in particular, by intentional design it’s not trained with a loss function weighted to avoiding bad outlier outcomes).


Of course, the exploitability of a raw policy prediction of a net still has implications for machine-learning systems that do sample a raw neural net uncritically, and for systems in places where search is infeasible. But it just goes to show how the first paper was not well communicated that even when trying to take care in this kind of “second impression” about in what way the result is interesting, it’s still easy to come away with misleading impressions.

10 Likes

Thank you for yet another interesting post. The missed loose ladder is striking, just the sort of thing I (6k) might sometimes miss in practice (though I can read it reasonably easily)! And that in a recent network! Thus a clear demonstration of the importance of tree search.

I have tried to revisit the original paper to see where I got a wrong impression, and have ended up more confused than ever.

The section about the pass-based attack in their site as I write (2023-05-29) suggests that 8 visits were the most their pass-attack could beat. This is odd, because their original site, as of 2022-11-11¹ said it could beat 64-visit-KataGo (top-20-pro level) by using 8192 visits. But, incidentally, how did more visits help KataGo, if it evaluated those positions incorrectly — does it sometimes see the dead stones as a problem?

Could it be that their apparently inconsistent claims depend on how their adversary was configured? I believe they sometimes, but not always, improved their tree search in actual play by using the network of their opponent to predict its responses. But I have already spent far longer on this than I intended, so I think I had better post this without trying to find out any more.

I am also not sure which part of my previous post was “not quite right”, but I suppose that I did not state (because I had forgotten or not realised) that the passing exploit only worked against low visits — but on reviewing some of the evidence, their original claim does seems to have been somewhat stronger. In any case I feel it was interesting that the adversarial approach got that far, let alone that it later led to the encirclement attack.

¹ I was having trouble finding the original pages, but they are on the Wayback Machine for goattack.alignmentfund.org, e.g. for 2022-11-11 at /https://web.archive.org/web/20221111132451/https://goattack.alignmentfund.org/. They currently seem to redirect that to a page about all their exploits, https://goattack.far.ai/pass-based-attack#8_visits. The old pages had, from their appearance at 2022-11-02 00:35:39 the title “Adversarial Policies Beat Professional-Level Go AIs”, which seems fair.
The cyclic attack is also described in the PDF on arxiv at https://arxiv.org/pdf/2211.00241.pdf.
I suppose there is probably also be a paper on the pass-attack on arxiv, but I have decided to post without looking for it.

Without verifying closely, I assume the 64 visit vs 8 visit thing relates to the timing of the bugfixes for KataGo’s handling of low-playout searches to override a bad policy. The authors of the paper were great and responsive to talk to in private correspondence, which included some investigation of this bug, and I’m thankful for them for helping to discover it so that I could fix it.

Unlike the cyclic group attack, which gives positions that are consistently misevaluated by the net where net is entirely blind to the concept, the pass attack was sort of just reliant on the raw policy being a bit fuzzy and non-robust and sometimes putting some mass on the pass move even though the evaluations were fine. For example passing would instantly be evaluated as much worse than playing a move, as far as I was aware. But if you’re doing a very low playout search, even if the algorithm “instantly” knows the correct values, you still might choose the bad move.

Why? Well, the most plain vanilla-possible MCTS implementation simply chooses moves proportional to the distribution of visits raised to some power. So perhaps you do 6 visits and you put 3 into the pass move and 2 into move A and 1 into move B. And suppose both move A and B are correctly evaluated as totally winning while passing is evaluated as totally losing. Nonetheless, passing received the most visits (3, instead of 2 or 1), so you still most likely going to pass.

Obviously this is stupid, yes you could do something much better (and KataGo does do something a little better, although pre-bugfix occasionally it could actually do worse than vanilla instead!), but still it’s not worth adding a lot of complexity to try to tune exactly the right logic for very low visits. Good move selection algorithms for MCTS mathematically are often derived from statistical methods - estimating variance of the values on moves along with modeling MCTS as a multi-armed-bandit problem, etc. But of course if you know anything about statistics you know that they generally are optimized for the case of many samples and give meaningless results with very few samples. For example, if there were a poll to predict the outcome of a local government election, what kind of statistics would you try to do to analyze the poll if the poll only sampled and asked a grand total of 6 random people, instead of thousands?

The answer is you wouldn’t, you would just get more samples (i.e. use more visits). If you do more visits, moves A and B will trivially overcome passing due to their winning evaluations, and then when further raised to a decent power the chance to pass becomes negligble.

Anyways, this is why trying to exactly optimize against and beat very low-visit MCTS is a slightly weird thing to do - the algorithm was never intended to be good in that case, and it’s very sensitive to the rate at which exact parameters of your algorithm break down as you get to too-low-sample sizes for statistics to work well. Even if in “normal” board positions it just so happens to still be “professional” level.

You do have to take some nuance and care in interpreting results from down-tuning a system to a given “level” (e.g. the number of visits that produces “professional” level play) and then showing flaws at that level. Like, suppose with more training a bot became “10x stronger” (whatever that means, however you measure that) uniformly across every fixed number of visits for normal board positions, but only “2x more robust” in how it handles outlier weird positions. Then, in the big picture, this a strict improvement in everything - both normal strength, and in robustness no matter your compute budget. But the bot may have gotten worse if you are measuring “how robust is it at the amount of compute that results in it being such and such level” if the 10x causes the amount of compute at which that level is reached to drop by more than the robustness improved.

At least, it’s been fun to think about and talk things with people, the authors of this this work heartily included, and I also think it’s important to figure this stuff out especially for deploying automated systems in the real world. Even if you deploy an AI system that is “human level” in some overall metric, it might be worse than “human level” in individual areas, especially if you correspondingly try to cut costs as it improves you may even turn improvements into regressions you didn’t realize after the cut.

9 Likes

This last month, cyclic group understanding on the most recent nets has improved what seems like a lot more on some test positions! Although without more feedback it’s still hard for me to tell what that translates to in terms of overall robustness/consistency/reliability in exploit games or in analysis quality in novel cycle group situations not in the training data.

Anyways, seems like a fine time to advertise a bit more widely - reddit post:

KataGo should be partially resistant to cyclic groups now
by u/icosaplex in baduk

A lot of the learning at this point is on eye-fight tactics - the policy is not very high on many kinds of multi-move tactics that create “false” eyes that become real due to the cycle. Such “false” eyes are almost never useful in ordinary life and death fights, so they have to be learned newly and specially in the case of cycle group fights. But it seems like deep enough search is now often capable of recognizing and finding them. For example in this OGS game https://online-go.com/game/53373372, white can connect at C6, which in conjunction with the throw-in ko at E1, threatens black’s corner.

And the result is that white can gain a “useless”, “false” eye at B7… except that this eye is actually real due to the cycle!

You can see KataGo is still “nervous” about relying on these “false” eyes for life though. In the principal variation above move 9 is about as bad as passing for white! Due to the cycle white’s still alive even if black throws in at J9, so this move “shouldn’t” really be part of this PV and neither should black be playing move 8, which carries no threat. Both moves in this PV are nonsense.

But I think KataGo will usually hesitantly trust the status enough not to play these kinds of moves if doing so would clearly fall behind on territory. You see them a bit more when the territory gap is big enough to tolerate an unnecessary defense. In the above example, black is hopelessly behind on territory if they can’t kill, so in that sense it doesn’t “hurt” black to struggle with a useless non-threat, and for white to answer it just-in-case even when they don’t need to. Given that white’s far ahead, white’s move 9 is just making absolutely sure that at least one of their eyes is “real”.

Cases where the whole board is involved in the race are hardest, here for example white is completely dead because big eye beats small eye and all other liberties are shared. But the predicted winrate for black winning is only 30%ish:

And actually, it really is the case that sometimes black loses in self-play - sometimes black misses the kill in self-play games in this kind of shape by playing the self-harming ko threat of H10 or J9, shrinking their 6-point eye to be equal to white’s 5-point eye, and maybe even later shrinking their own eye to be smaller than 5 points by prematurely capturing and losing.

But at least it’s way better than nets prior to this month, which reported winrates for black of 1-3%:

Reaching 1% winrate for black is itself a substantial “improvement” over yet earlier nets, since 1% is 100x more than the winrate of 0.01% that a net might give in a “normal” late endgame where it thought White was ahead by 50+ points with a big kill. But of course, merely thinking that there is a 100x increase chance of trouble is still nowhere near good enough if in absolute terms that still only translates into 1%.

Anyways, nice to see the learning has moved on past that here. If anyone can find or construct more cycle group positions that still have misevaluations that extreme, I’d be curious to know about them. Maybe with some more artificial positions it’s still not too hard, I’m not sure.

8 Likes

“”white can connect at C6, which in conjunction with the throw-in ko at E1“”
No, that is incorrect.white play c6 black play b3,white play c7,black play a7

4 Likes

Ooh nice, yes there is still a bit of blind spot here. :slight_smile:

@fallingsnow88 - by the way, your games have been very impressive. You’ve been really good at finding a lot of these bad evaluations, just like the mistake you pointed out here right now! It’s been great to have your help, thank you again for playing them.

7 Likes

update

https://arxiv.org/pdf/2406.12843

2 Likes

Gift attack

We also discovered a new non-cyclic attack, which we call the “gift attack”, that defeats dec23-victim at 512 visits of search in 91% of games. In this attack, the adversary sets up a “sending-two-receiving-one” situation where, for no valid reason, the victim gifts the adversary two stones and then needs to capture one back. However, the victim’s recapture is blocked by positional superko rules. The adversary sets up the position to have the resurrection of one of its dead groups at stake, leading to a disaster for the victim.


tried to make sgf which works better on OGS, this time superko works and its seen why its not allowed to take stones back

but, looks like they use rules where self capture is possible (New Zealand?)

4 Likes

I love the cappuccino shape

Screenshot_20240620-070410_Chrome

and the among us shape is very cute too

Screenshot_20240620-070517_Chrome

But I really don’t get why a superhuman player would throw stones inside that shape.
The neural network suggests that move and the MC algorithm confirms that. Why? It’s completely useless. Self atari for no reason.

3 Likes

If this exploit relies on positional superko, then I’d say it exposes more a flaw of those (rarely used) rules and a potential lack of training by the victim AI with those rules than something actually interesting from a go perspective.

1 Like