Whether you are running a dating site, classifieds site or some other type of marketplace, toxic language is likely something that you encounter and need to deal with to keep your site clean and protect your users.
This article explains how to make use of Implio's built-in Text Vision filters to detect and block out-of-line contents before they hit your site.
Target audience
Toxic language filters can be useful for the following types of marketplaces and content:
Type of site |
|
Type of content |
|
Supported fields and languages
Toxic language filters operate on the following API input fields:
- content.title
- content.body
The following languages are currently supported:
Language |
ISO 639-1 code |
---|---|
English |
en |
French |
fr |
Spanish |
es |
Not seeing the language you are looking for? Reach out to our support team to know more about upcoming languages!
How to use toxic language filters
Before you start
Implio's built-in Text Vision filters leverage automatic language detection, as they operate on specific languages.
For optimal results, make sure you set the content.languageExpected API input field to make the language detection more reliable. See How to check the language in which users write for more information.
BLANG variables
Toxic language filters are exposed as several BLANG variables, each corresponding to a different kind of toxic language.
Each variable contains the number of terms that were found in the text:
Variable |
Description |
Type | Possible values |
---|---|---|---|
$text.blasphemyCount |
Number of blasphemy terms detected in the text |
Integer |
Number of terms detected, 0 if no term matched or if the language isn't supported |
$text.sexualTermCount |
Number of sexual terms detected in the text |
Integer |
Number of terms detected, 0 if no term matched or if the language isn't supported |
$text.badWordCount |
Number of bad words detected in the text |
Integer |
Number of terms detected, 0 if no term matched or if the language isn't supported |
$text.violenceTermCount |
Number of terms related to violence detected in the text |
Integer |
Number of terms detected, 0 if no term matched or if the language isn't supported |
$text.extremismTermCount |
Number of terms related to extremism detected in the text |
Integer |
Number of terms detected, 0 if no term matched or if the language isn't supported |
$text.racismTermCount |
Number of terms related to racism detected in the text |
Integer |
Number of terms detected, 0 if no term matched or if the language isn't supported |
It is worth noting that each term or expression may only be counted once by the above variables. In other words, there is no overlap between the different filters.
Additionally, this variable contains the sum of all the above-listed variables:
Variable |
Description |
Type | Possible values |
---|---|---|---|
$text.toxicTermCount |
Total number of toxic terms detected in the text |
Integer |
Number of terms detected, 0 if no term matched or if the language isn't supported |
Filtering toxic language using rules
This BLANG condition will pick up any occurrence of toxic language in the text:
$text.toxicTermCount>0
which is strictly equivalent to:
$text.blasphemyCount>0 OR $text.badWordCount>0 OR $text.sexualTermCount>0 OR $text.violenceTermCount>0 OR $text.extremismTermCount>0 OR $text.racismTermCount>0
You may choose to remove some of the variables from the condition, depending on what type of toxic language you wish to filter out (blasphemy for instance may be considered as acceptable), or split the condition into multiple ones with different actions.
Setting the rule's action
Mild profanity or other kinds of toxic language can sometimes be acceptable depending on the context in which they are used, and how much you tolerate on your site.
For instance, use of common words like 'crap' may not be reason enough to refuse a piece of content.
For this reason, it is preferable to set the corresponding rule's action to Send to manual rather than Refuse, so that the content can be reviewed by a moderator.
Finally, you may decide to refuse contents that contain multiple occurrences of toxic language. You can do so by adding a rule such as:
$text.toxicTermCount>=3
and setting its action to Refuse.
Known limitations
Our Text Vision filters have been meticulously crafted by our team of linguists and data scientists and tested against large corpora of user-generated content.
However, they may sometime bring false positives. Conversely, they may be missing some terms or expressions.
We update Text Vision filters regularly. We value and welcome your feedback to help us improve Implio.