The researchers did that already according to their preprint (page 8, I’m somehow unable to copy+paste on my phone), to confirm the early passes are the actual issue, but they also explained, that this is somewhat undesirable, because a) humans don’t like to play against someone who doesn’t pass and b) from a scientific point of view it would be better if KataGo learned not to pass without adding such external knowledge.
This seems like kind of a weird value judgement. As I understand it, one of the “innovations” KataGo made was to incorporate external knowledge (like ladders and nets) for a better result. Without external knowledge, it would perform similarly to LZ
I don’t completely disagree with you, but aren’t you a bit too harsh? (that said, I’ve not read the paper) Let me take another situation which I find similar and actually happened many times in an AI context.
Researchers train an IA to be superhuman in a video game, and by trial and error, the AI finds a glitch that breaks the game to the point that they complete the stated objective much better that anyone could do withtout the glitch. I don’t have a citation with me but I remember something like that happening in a baseball game with really funny output.
Now “AI finds a glitch in [baseball game]” would be a better title than “AI beats every human being at [baseball game]”, but I wouldn’t call the second one blatant lying; technically correct clickbait would be my description. Don’t you think the situations are pretty similar?
I think their point is valid when considering future development of artificial (general) intelligence. But with this specific attack it seems weird indeed, also because KataGo is only passing early to be fun as an opponent for humans (at least I think so).
I was mainly just asking to make sure I understood in principle what was going on with this attack. From a practical perspective, I don’t think anyone cares, as it’s doubtful that many are playing with Tromp-Taylor rules, specifically without a stone removal phase.
A pedantic aside on the Tromp-Taylor rules
I think the authors are a bit imprecise when they refer to these specific rules as “Tromp-Taylor rules”, since the commentary of the Tromp-Taylor rules does give the option of using a simple stone removal phase. It should be noted that komi is also an aspect of the rules relegated to the commentary, and all of their example games appear to use komi. Thus, it seems like they are cherry-picking a bit among the commentary parts, if they employ komi, but not the stone removal phase. I would even contend that the phrase “Tromp-Taylor rules” should by default be interpreted to include all parts mentioned in only the commentary (such komi, stone removal, and handicap).
A different avenue of criticism of the paper on more scientific grounds (which I am unable to judge):
Is there a way to read this commentary outside of twitter? I only managed to read 9 very short messages in which he basically said “This paper fails to cite some relevant priork work, and misrepresented other prior work”. But I suspect there might be more that I’m not seeing because I have no clue how to use twitter.
Maybe. This looks like they found a glitch in go. Except the glitch only happens when you play with a particular variant of Tromp-Taylor rules and don’t tell the victim about it.
Deepzengo also lost a game by 0.5 in the first world championship on a “rules glitch”, because its programmers had programmed it for area counting but the tournament was played with Japanese rules. Is it really a glitch, or is it just bad parameterization?
We know that there can be real weakness with go AI. We’ve seen examples.
- AlphaGo started playing nonsensical moves after Lee Sedol’s warekomi in game 4;
- in the AlphaGo documentary film, Fan Hui explains that he found a weakness in AlphaGo during his time working for Deepmind, and that they were unable to fix it;
- Handol failed to read a geta during the retirement matches against Lee Sedol.
Discovering, exploiting, then understanding and fixing these weaknesses would be interesting. Especially since similar AI are used in fields like self-driving cars, where failures have dramatic consequences.
But I’m not under the impression that these researchers found any actual weakness like that.
They claim that they have told KataGo about the rules. KataGo has some settings regarding cleaning up the board before passing, but I’m not sure those were used as intended.
I’ve revisited
Where apparently lightvector says
Primary author of KataGo here:
Wanted to say that I think this is overall good/interesting research. I have both some criticisms and some supports to offer […]
So if the “primary author of KataGo” thinks their research has some value, I will trust that judgement.
Edit, I will share some additional things here, based from what I’ve learned from the Computer Go Community on Discord (where lightvector is communicating too):
- KataGo uses 1600 visits during training. It’s impressive that it can still beat very strong humans with only 64 visits, but the adversary would be more impressive, if it could successfully attack KataGo within the scope of what it was trained to do (which is being strong at go with 1600 visits).
- Back in July lightvector told the authors that 500 visits are only default “to be friendly to users with weaker hardware” and suggested to use more visits (1600, without knowing much about the attack).
I’m just reading it as a very polite response from the katago author.
They’re claiming to have found a bug in katago. What else could he have replied? “I’m the katago author but your so-called research is crap, I refuse to admit that my software can have bugs”?
I don’t think he could have replied with anything less than “Thank you for your interest in katago, I’ll look into this potential bug”.
And after this polite introduction, he goes on with his offered “criticism and support” which sounds a bit like “I’ll give you a few hints in case you want to write an actual serious research paper in the future”.
Maybe, but I can’t know what’s going on in lightvector’s head. Anyway the researchers are working on improving their attack. Time will show what they can achieve.
This is not what they claim. Adversary samples are not bugs, but represent an old observation that because NNs learn to compose the function from linear pieces, special samples can often be crafted that make the network output incorrect results, by over-amplifying some of those function components. A picture of a cat that the net sees as a dog, for example.
Many other comments also misunderstood the situation here. They used a ruleset that Katago WAS trained on (modified TT, stones alive except within pass-alive territory) and WAS aware of, and found samples that still made the NN suggest wrong move (pass) for the position, instead of cleanup from non-pass-alive territory.
Lightvector emphasizes that this only works in low-search conditions. More search can correct the network’s initial wrong suggestion. But the whole point here is that this is (supposed to be) an adversary attack on the network, not on the search code.
I think the interesting question is whether the exact locations/shapes of those adversary stones matter, or whether a human doing similar things would also trigger the error (under same ruleset). The former would make this true adversary samples, the latter is a more general training deficiency.
What they found is a situation where a stupid player claims a victory and is agreed upon by a stupid judge. This is not enough to claim to find a loophole in Katago. This study may be not telling a lie but is very misleading.
I think the adversary correctly won against KataGo considering Tromp Taylor rules. KataGo shouldn’t pass in such cases, although it’s not relevant for normal usage of KataGo.
You could probably make a more interesting adversarial agent by picking a KataGo network and having it play and train against a frozen version of itself, where only the KataGo you picked can learn and adapt against the other one. Would be interesting to see how it would exploit its own weaknesses and biases while hopefully still playing a reasonable game of go. Might even make for useful training data to further strengthen KataGo.
I like the idea, but I’m really not sure if KataGo is truly prone to adversarial attack. I think it would be great to apply your idea to use KataGo as an adversary itself again other weaker bots to figure out edge cases like missing ladders.
Not to mention, theym title claims to beat “professional-level go AIs”, plural, suggesting some kind of universal method that works against several AIs.
Instead, the only AI they’ve “beaten” is a particular version of Katago with an unreasonably low number of playouts, and it’s unclear whether Katago was really aware that the game was supposed to be played under Tromp-Taylor rules.
I just love how this entire thread is just us trashing their paper xD
… and all that I wanted is just to remove the risk of the rank inflation…!
Honestly my main problem with it is the wording and how it is being misinterpreted by people who don’t know go. Seeing some people putting “99% win rate” and “superhuman ai” in the same sentence (when the adversarial agent only gets this sort of win rate against the raw policy which isn’t superhuman), and they also mostly don’t seem to know WHY the adversarial agent wins (early passing by katago, not because it outplays katago in any way, no wonder it loses to amateurs), and the fact that katago wins once you give it realistic amounts of playouts also seems to be missed by most articles talking about this