Creating a dataset to develop automatic cheating classifiers

gennan · April 30, 2021, 9:18am

Once again, cheating detection was brought up by someone on Reddit and L19.

This is a copy of my L19 comment:

It is not only important to successfully detect cheating (true positives), but is also important (perhaps even more) that the detection method gives very few false positives (leading to unjustful disqualifications and public discrediting of innocent players). You want to avoid things like the Prosecutor’s Fallacy and Confirmation Bias.

I think what’s actually needed for proper automatic cheating detection, is this procedure (which is quite common in machine learning and software classifier competitions):

[1] prepare a large (the larger, the better) collection of games where you know for certain whether there was cheating or not, because it includes games of various volunteers who cheated on purpose in various ways that they saw fit. This collection needs to have games from all levels of play and many different playing styles.
[2] with this annotated collection of games, developers can create and test classifiers (by machine learning or some other method).
[3] you can objectively compare the quality of various classifiers by (for example) their Matthews correlation coefficient.
[4] you could even create a competition between classifiers from different developers by using a (perhaps undisclosed) representative subcollection of the annotated games that were not used in the creation and testing of those classifiers.

I think that step [1] will be a lot of work, requiring a coordinated effort to create a high quality dataset to use in the next steps. Also, I think step [1] is typically a task for an organisation and not a task for the developers of classifiers.

But once you have this data set, a public competition with prize money could be quite a cheap way to get a very good classifier (see for example the CASP14 protein folding competition that was won by Deepmind’s AlphaFold in 2020).

So I was thinking, could OGS perhaps support the creation of such a standardised dataset?
For that, OGS would need to allow specifically registered volunteers to cheat. Those volunteers would send their games to a designated person that collects those games for this specific game collection.
Would this be acceptable when those registered volunteer cheaters notify the mods of games where they cheated, so mods can anull those games if they were ranked? Perhaps those volunteers should also notify their opponent after the game when they cheated and explain why they did it?

shinuito · April 30, 2021, 9:24am

rumours

gennan · April 30, 2021, 9:32am

Yes, I know there are various efforts, but I’m a bit bothered by the secrecy that seems to surround those. It makes me feel that their methods are not so great, otherwise they could be transparent about the performance of their methods. You don’t want to rely on security by obscurity and have people fall victim to flaws in their methods.

square.defender · April 30, 2021, 9:33am

Exactly. I’m afraid very few people who search for cheaters actually tried to use their methods on games of surely not cheaters.

flovo · April 30, 2021, 9:41am

since you need a lot of games (clean and with cheating), just let your volunteers play against each other. Do unranked games and you are fine.

shinuito · April 30, 2021, 9:44am

It depends on the context of the secrecy. If you mean why don’t they just annouce “Hey us (A,B,C teams/servers/people) are working on anti-cheating”, I don’t see why they couldn’t. If you mean “Hey, us (A,B,C) are using X,Y,Z methods to detect cheating” then I think it’s better off not explaining those methods.

I think yes and know. If it’s peoples details or money etc, I can understand why you might want open source/transparent security, but maybe more so in early development of the site at hand. Have as many people try as many things and report the issues when they find them and fix them.

However, when you announce what methods you use to detect a cheater, they know exactly what you’re looking for (whether it’s certain patterns in AI winrate/score, killing a group you’ve no business killing, shapes and tesuji that might be consider above your level, in the sense one needs strong reading to follow up the possibilites-- one can always just guess a shape move), and they can adapt to find more subtle ways to cheat and you need to start all over again to detect these new ways. You’ve no idea what new methods they might use, and they’re not going to tell you!

gennan · April 30, 2021, 9:46am

If those volunteers know their opponent is also a volunteer, they will be suspecting cheating more quickly, so they may play differently than they normally do. This can introduce a bias/contamination in the data set (compared to games with unsuspecting opponents).

gennan · April 30, 2021, 9:50am

They don’t need to disclose their methods. Nobody in such a competition would need to disclose it. If they use machine learning, it’s even impossible to explain how it works.

I only want to know how well it works, by some proper unbiased correllation measure (such as Matthews correlation coefficient) on a standardised high quality data set.

BHydden · April 30, 2021, 9:53am

AI is prohibited even in unranked games under usual conditions, so two mutually agreeing volunteers is the only way you could run such a project

flovo · April 30, 2021, 9:54am

"Hey, we think you have cheated, so you don’t get the price we promised the winner. Also we tell everyone you are a cheater. "
"I didn’t cheat. Take back your accusations, and show your evidence. "
"Sorry, we can’t do this, but we can ensure you that we feel quite sure, so please take the hit to your reputation. "

gennan · April 30, 2021, 10:02am

That’s why I’m asking you (OGS) if you are willing to make a temporary exception (under strict conditions) for this specific goal, for the greater good of all go players.

flovo · April 30, 2021, 10:05am

The volunteers are already biased, since they know they are being watched and allowed to “cheat” in any way they want. The little additional bias by them knowing «their opponent is more likely to “cheat”» should be a problem if they are able to overcome the bias of «no consequences for being catched “cheating”».

And the bit of «no consequences for being catched “cheating”» is making me very sceptical this would work as expected at all.

shinuito · April 30, 2021, 10:22am

I mean that’s kind of the way it is (or feels like it is) at the moment in some circumstances (unfortunately), but that’s not what I meant (which I hoped would be more obvious).

There’s a difference with a personal one to one, or 1 to panel discussion of a particular game or set of results in a tournament, and just publishing your code on github and giving a guide on how to cheat to people that want to.

One might assume a very low dispersion of the methods used to the public from the accused player, or at least not a technical understanding and eidetic memory of the conversation, should they voice their displeasure on say Facebook etc.

I’m not even sure how far along it is, and that it’s not just done case by case so far. I don’t have any confirmed source (other than the rumour above) saying there’s a big lump of code somewhere that does this cheat detection for OGS. I know there’s some efforts by Antti [2009.01606] Derived metrics for the game of Go -- intrinsic network strength assessment and cheat-detection but I don’t know if they’re related, or being used etc.

I do understand though that one would want 100% confirmed cases of cheating rather than highly suspected cases for a rigid test dataset. Some people will just never admit it if you took a set of reported/suspected games from a server.

Really you don’t just want somebody to cheat, you want them to experiment with cheating in more and more undetectable ways and submit their games to the database and see the progress on how easily detectable it is, and then try again to dodge the current system. Then you’ll get to know the limitations and they’ll be stored for the long term, for new versions or new systems to test. I think one would probably want volunteers from a range of levels too, rather than one person playing at what they think is a certain level.

square.defender · April 30, 2021, 10:34am

I’m afraid normal volunteers can’t create good enough not cheat games.
Pro dan games have the biggest % of moves identical to bot, I think. So its easier to just test OGS mods methods on pro games. If at least one of Pro would be called cheater, then OGS methods should be changed.

OGS mods, did you ever test your methods on pro games?
(yes / no)
@flovo , @BHydden , …

gennan · April 30, 2021, 11:04am

Yes, I think suspicious games should be pruned from the standard dataset. You don’t want such games entering the dataset labelled as definitely not cheating.
But you could use such games as test cases for classifiers that perform really well on the standard dataset. If the top classifiers turn out to be highly reliable on the standard dataset, it could even lead to retroactive decisions by organisers of past online tournaments.

Yes, it is likely that there will be a sort of arms race between cheating and cheating detection. You could even turn that into a contest, like some companies paying hackers to find holes in their security.

Lys · April 30, 2021, 1:37pm

Why should I trust AI to detect cheating?
They’re stupid: they crush human players but they don’t know why.
Also, before becoming better than humans in some fields, they were worse! Many of them still are!

It’s easy to recognise if an AI won a game. It won. That’s it.
But how would you recognise whether an AI picked a cheater or accused an innocent?

shinuito · April 30, 2021, 2:11pm

I think machine learning can be used in quite a broad context. I mean in some instances machine learning is a fancy word for some kind of linear regression

I think deep learning would be more akin to training an computer to look at a game and decide whether or not it was played by Katago, Leela etc as Black or White.

I don’t think this is what was meant. It’ll probably have to be down to human judgement and conversation with the players involved where possible, but the machine learning aspect is probably just to pick out the probable cheaters from the very obviously not cheaters. I think it’s just meant to be a way of automating the process of finding these anomalies/interesting datapoints in huge datasets (like all of OGS’s ongoing games ), similar to how astrophysicists might have large sky surveys but they need to pick out a few stars or galaxies that have some pattern of interest and focus their efforts on the plausible cases, rather than finding them in the first instance.

txwolf · April 30, 2021, 5:12pm

The game depends on honesty and wiliness to corporate of vast majority of us. When there are enough cheaters and anti cheating method becomes necessary, the interest will diminish over time and we move on to a different hobby. This is not a popular game to start with anyway.

I honestly don’t believe there are so many cheating. It’s when we keep talking about it, the feeling changes. This game becomes not interested any more. That’s what I am afraid the most.

Groin · April 30, 2021, 5:31pm

Well it’s much more a problem online as IRL. We will enjoy still players meeting face to face, with the same pleasure as ever.
Besides for online, there will be cheaters like there are already sandbaggers, escapers… But I think that it can stay a bit like a minor problem for not too strong players which are the large majority, because cheating can be more easily detected. I fear mostly that we lose nice games of very strong players who could stop to play online.

martin3141 · April 30, 2021, 5:51pm

I don’t necessarily disagree, but I’m curious what are the reasons for this believe. Have any cheaters been caught recently? Perhaps on online platforms such as KGS? I’d be interested in how many cheaters are detected on OGS, too.