Online hate speech shows up most often on discussion forums, according to a Utopia Analytics study for the Ministry of Justice in Finland. The report found that discussion forums are home to 97% of identified hate speech messages. The next largest platform types are Twitter messages at 2.5% and Instagram messages at 0.2%. Blogs, news comments and public Facebook messages make up less than 0.02% of all identified hate speech. The data set didn’t include private discussions, for example Facebook groups or accounts.
The project, part of the Facts Against Hate program by Ministry of Justice, tested the ability of artificial intelligence to recognize hate speech in online environments. The approach combined human evaluation with machine learning. A key goal was to find the main channels of hate speech, and to identify differences in hate speech from platform to platform.
The definition of hate speech was based on academic research in the social sciences. The definition was done by producing hate speech categories, and then used to manually identify examples of hate speech in a data set of online messages. These annotations were then used as training data for Utopia AI Moderator, a language-independent tool that utilizes text analytics and machine learning. The data set was 12 million Finnish comments and posts from September to October 2020.
The results show that about 150 000 messages that contain hate speech appear on publicly available Finnish social media platforms every month. That’s about 1.8% of all messages.
Among the public international social media platforms, Twitter seems the most prominent, with 7 450 messages identified as hate speech, or 0.14% of all tweets. Retweets play a significant role in circulating these messages: 39% of all hate-speech tweets are duplicates.
“While the data set consisted of mostly Finnish messages,” says Utopia’s CEO Dr. Mari-Sanna Paukkeri, “the results would be very similar in other languages . For example, the major platform for Finnish hate speech, Ylilauta, is a peer to the commonly known 4chan. Moreover, we can build a similar AI model to identify hate speech in any language in only two weeks. We only need a skilled individual to say how hate speech should be defined in your culture and language and we need the data to analyse.”