Profanity filter filters more than it should (at least in italian language)

Hi!

Yes, I know that profanity filter can be disabled as described in the new wonderful online help:

Profanity filter (Censors swearwords. To disable - simply switch to a language you do not expect to be using)

and actually I disabled it right that way, even before the online help told me that! :wink:

But anyway some users don’t disable it and the following game chat may happen:

Me - “Will you be at next tournament in city-name?”
My opponent - “Probably, and you?”
Me - "I think so. Also person-name would like to “$ !%”

and also:

Me - "Rank instability may occour when you “$ !% a stronger opponent.”
My opponent - “The word after “you” is censored”
Me - “:smiley: bea t”

(the space is to cheat profanity filter)

I don’t know if in the english profanity filter the words “come” and “beat” are censored, but in the italian one they are.
And that’s confusing! :smiley:
Expecially when you don’t want to say that “Rank instability may occour when you the-four-letter-word a stronger opponent.” (that also could mean that you beat him, but oh you know what I mean :smiley: ) neither you would say that “person-name is looking for intercourses”. :smiley:

So, italian verb “venire” means to come. Yeah, it also could mean ejaculate, but that’s a very secondary meaning.
Similarly, italian verb “battere” means to beat or defeat. Yes, it also could mean “to prostitute” but, again, that’s a secondary meaning.

We found that also the italian noun “regina” is censored. It’s for “queen”. I really don’t understand why.

So, sometimes chatting in italian is very "$ !%. (hard :wink: )

5 Likes

Regina isn’t just a noun, it can also be a name. In fact, I happen to know a very cute Regina. :smiley:

Imagine if your name was censored by default. “Hi, you can call me $!?%”

3 Likes

A friend once convinced me to play some Portuguese sci-fi strategy game online with him and I got a warning for innocently trying to name my spaceship “Galactica,” because the moderators thought I was going for an obscure sexual innuendo. I get you, @lysnew.

Maybe you could post in the Italian Group and compile a list of words that you believe have an undue classification so the devs can remove them from the filter.

Like you said, just changing the profanity setting isn’t the most efficient solution, even because the settings are not persistent. :/

Wow, you’re right! And it’s not an unusual name in Italy!

1 Like

:rofl:

That would be amazing. The filter dictionaries were taken somewhere from the internet and as you can imagine they are hard to hone, when we do not know the language :smiley: If there were more it would great to be able to fix them “all” at once, but that said, if we do not come up with more in a week or so we will fix at least those mentioned.

I also recall a cute situation when our filter used to filter “negro” under Spanish settings (which just means the color black obviously). Made it a bit diffiuclt to discuss go for them :smiley:

1 Like

I read quickly that thread before posting this one. I think there where no distinct language settings at that time and “negro” actually can be an insult in English while it simply means black in spanish.

Since @AdamR is on board with the idea, we can get to work, @lysnew. Here’s the complete list of Italian words and expressions currently being filtered on OGS:

Filtered Italian Words & Expressions
ammucchiata;
anale;
arrapato;
arrusa;
arruso;
assatanato;
bagascia;
bagnarsi;
baldracca;
balle;
battere;
battona;
belino;
biga;
bocchinara;
bocchino;
bofilo;
boiata;
bordello;
brinca;
bucaiolo;
budiĂčlo;
buona donna;
busone;
cacca;
caccati in mano e prenditi a schiaffi;
caciocappella;
cadavere;
cagare;
cagata;
cagna;
cammello;
cappella;
carciofo;
caritĂ ;
casci;
cazzata;
cazzimma;
cazzo;
checca;
chiappa;
chiavare;
chiavata;
ciospo;
ciucciami il cazzo;
coglione;
coglioni;
cornuto;
cozza;
culattina;
culattone;
culo;
di merda;
ditalino;
duro;
fare unaĆ ;
fava;
femminuccia;
fica;
figa;
figlio di buona donna;
figlio di puttana;
figone;
finocchio;
fottere;
fottersi;
fracicone;
fregna;
frocio;
froscio;
fuori come un balcone;
goldone;
grilletto;
guanto;
guardone;
incazzarsi;
incoglionirsi;
ingoio;
l’arte bolognese;
leccaculo;
lecchino;
lofare;
loffa;
loffare;
lumaca;
manico;
mannaggia;
merda;
merdata;
merdoso;
mignotta;
minchia;
minchione;
mona;
monta;
montare;
mussa;
nave scuola;
nerchia;
nudo;
padulo;
palle;
palloso;
patacca;
patonza;
pecorina;
pesce;
picio;
pincare;
pipa;
pippone;
pipĂŹ;
pirla;
pisciare;
piscio;
pisello;
pistola;
pistolotto;
pomiciare;
pompa;
pompino;
porca;
porca madonna;
porca miseria;
porca puttana;
porco due;
porco zio;
potta;
puttana;
quaglia;
recchione;
regina;
rincoglionire;
rizzarsi;
rompiballe;
ruffiano;
sbattere;
sbattersi;
sborra;
sborrata;
sborrone;
sbrodolata;
scopare;
scopata;
scorreggiare;
sega;
slinguare;
slinguata;
smandrappata;
soccia;
socmel;
sorca;
spagnola;
spompinare;
sticchio;
stronza;
stronzata;
stronzo;
sveltina;
sverginare;
tarzanello;
terrone;
testa di cazzo;
tette;
tirare;
topa;
troia;
trombare;
uccello;
vacca;
vaffanculo;
vangare;
venire;
zinne;
zio cantante;
zoccola.

After you and your fellow Italians remove from it anything excessive, we can (re)convert the list into a regular expression, encode it to Base64, and submit an updated profanity filter to the GitHub repository as a proposed change without even having to bother @anoek.

In the meantime, I’ll see if there’s anything that sh/could be removed from the Portuguese filter as well.

2 Likes

Where to get the filter lists? :o

I want to see the German one.

1 Like
German filter list
['fick',
 'geil',
 'nackt',
 'onanieren',
 'vögeln',
 'wichser',
 'ficken',
 'arschloch',
 'möse',
 'arschficker',
 'hurensohn',
 'hure',
 'arsch',
 'arschlecker',
 'bumsen',
 'vögeln',
 'pimmel',
 'analritter',
 'titten',
 'möpse',
 'bratze',
 'dödel',
 'fratze',
 'poppen',
 'muschi',
 'mucke',
 'hackfresse',
 'wichsen',
 'ische',
 'latte',
 'kampflesbe',
 'kackbratze',
 'kacke',
 'scheiße',
 'kimme',
 'knackwurst',
 'nippel',
 'hupen',
 'milchtĂŒten',
 'MILF',
 'morgenlatte',
 'mufti',
 'pimpern',
 'picheln',
 'pinkeln',
 'pissen',
 'kacken',
 'pisser',
 'porno',
 'popel',
 'reudig',
 'rosette',
 'lĂŒmmel',
 'flittchen',
 'schabracke',
 'schnackeln',
 'tittchen',
 'vollpfosten']
2 Likes

this is hilarious :rofl:.

some 13 year old other than myself must have compiled this list.

3 Likes

So many fewer entries. D:

Interesting to see.

  • “vögeln” is mentioned twice. It’s a relatively benign way of saying ‘to have sex’ though. It can also be the accusative of “vögel” (birds) though
 easier to give an example than to explain. “Wir geben den Vögeln Futter.” = “We give food to the birds.”
  • “Vollpfosten” (nonsensical word depicting someone who isn’t very smart) is in there, but the much more frequently used “Idiot” isn’t.
  • “Milchtueten” is probably in there because one may interpret it as a roundabout way to say ‘boobs’, but it’s more likely that someone using the expression is talking about actual milk cartons (“milk bags”) than boobs.
  • “Hupen” can be either a noun (signal-horns) or a verb (to honk), and yes, again a basically unused synonym for ‘boobs’.
  • “Picheln” just means to drink (alcohol).
  • “Knackwurst” is just a popular type of (boiled) sausage. I’m not aware of any other connotations.
  • “Mucke” is a colloquial term for music in general, not sure why it’s on the list.

By the way, this is the best German slang dictionary I’ve found so far. (not just swearwords though) :slight_smile:

Even "cerveja" ("beer") was being filtered in Portuguese. Not to speak of "amador" ("amateur"), which, of course, is a common word when talking about Go.

I’ve removed a few things and added others, basing some of the choices on the English filter.

I can think of one. But it is on a par with any lengthy object. I can provide further groceries like “Banane” and “Gurke” and of course “Wurst”.

2 Likes

Well, yea, it’s probably more useful to attempt to cover the most frequently used insults, as opposed to attempting to filter all words and phrases that could potentially be interpreted as having a sexual connotation.

1 Like

The German list doesn’t look that bad indeed. Not many more strange ones apart from those you’ve already listed.

Now let’s hope nobody ever invents a sexual technique called ‘Affensprung’!

2 Likes

Ok, I will.
What should I do?
List terms that shouldn’t be on the list or vice versa?
Shall I send to you by PM?

Remove what you deem unnecessary, add anything you think should be covered, and send me the complete updated list (PM is fine), preferably with a single word or expression per line.

I will update the profanity file, both the Italian and Portuguese filters, and submit a new pull request at the GitHub repository—unless @anoek or @matburt prefer something different.

@smurph, if you want to provide an updated list for the German filter, I can submit it along with the other proposed changes. Same for anyone interested and other languages.

1 Like

Well, I’d say delete all the words I mentioned and perhaps add “idiot”. Obviously there are way more insults but that’s not the topic of this thread. :stuck_out_tongue:

It would be useful to have a guideline, though. If it was me, I’d say

  • if you want to filter something, filter the most frequently used insults
  • if you want to be family friendly, also filter vulgar slang
  • don’t filter sexual innuendo, that’s just silly

My take: if you are using sexual innuendo, you’re not insulting anyone and you’re most likely just having a chat. In that case, it’s pointless to filter. I would expect most of these terms to show up in PM or general chat as opposed to game chat.

If you want to insult someone, it’s most likely in game chat or via PM and you’ll be using insults, not sexual innuendo. Accordingly, again it doesn’t make any sense to filter innuendo.

If you want to “protect the children”,
 good luck. :stuck_out_tongue_winking_eye:

1 Like

Bad word filters are notorious hard to get right. To the point of being almost useless. This is known as the Scunthorpe Problem https://en.wikipedia.org/wiki/Scunthorpe_problem and it’s pretty hilarious if you’re as immature as I am.

4 Likes

Yeah, the Dutch list also needs some work
 Several terms seems to come straight out of the dictionary, and would never be used as such in a sentence (“reet trappen, voor zijn”), there’s several words that have a more common meaning without being insulting (“utrecht” is the name of the 4th largest city, perhaps they meant “utrechtenaar” which used to mean homosexual man; “poot” usually just means leg, could also be used to describe a sexual organ; “nicht” commonly means cousin, but is also used to describe homosexual men), many words have a “g” at the end which doesn’t belong there (“snolg”, “delg”, “mutsg”) and a lot of them simply aren’t insulting (“naakt” just means naked, “engerd” means creep, “balen” is to be disappointed, “schatje” means cutie).

Most of the words are rather oldfashioned as well.

1 Like