Personalized warnings could reduce hate speech on Twitter, researchers say


A carefully crafted set of warnings directed to the right accounts could help reduce the amount of hate on Twitter. That’s the conclusion of new research examining whether targeted warnings could reduce hate speech on the platform.

Researchers at New York University’s Center for Social Media and Politics found that personalized warnings alerting Twitter users to the consequences of their behavior reduced the number of hate-language tweets a week later. While further study is needed, experience suggests that there is a “potential way forward for platforms seeking to reduce the use of hate language by users,” according to Mustafa Mikdat Yildirim, the main author of the article.

In the experiment, researchers identified accounts at risk of suspension for breaking Twitter’s rules against hate speech. They searched for people who had used at least one word contained in “hate language dictionaries” during the previous week, who were also tracking at least one account that had recently been suspended after using such language.

From there, the researchers created test accounts with characters like “hate speech warning” and used the accounts to tweet warnings to those people. They tested several variations, but all had roughly the same message: that using hate speech exposed them to be suspended, and that this had happened to someone they were following before.

“The @account user you follow has been suspended, and I suspect it was because of hate language,” read a sample post shared in the log. “If you continue to use hate speech, you could be temporarily suspended. In another variation, the account that issued the warning identified itself as a professional researcher, while advising the person that they were in danger of being suspended. “We have tried to be as credible and convincing as possible,” Yildirim told Engadget.

The researchers found that the warnings were effective, at least in the short term. “Our results show that a single warning tweet sent from an account with no more than 100 subscribers can reduce the ratio of tweets with hate language by up to 10%,” the authors write. Interestingly, they found that “more polite worded” posts resulted in even larger drops, with a drop of up to 20%. “We tried to increase the politeness of our message by basically starting our warning by saying, ‘oh, we respect your right to free speech, but on the other hand, keep in mind that your speech of hatred could harm others, “” Yildirim said.

In the article, Yildirim and his co-authors note that their test accounts only had around 100 subscribers each and were not associated with an authoritative entity. But if the same type of warnings came from Twitter itself, or from an NGO or other organization, then the warnings could be even more useful. “What we learned from this experience is that the real mechanism at play might be our letting these people know that there is an account or entity that is monitoring and monitoring their behavior,” Yildirim explains. “Having their use of hate speech seen by someone else could be the most important factor that led these people to decrease their hate speech.”

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through any of these links, we may earn an affiliate commission.

About Homer Yonker

Check Also

Shootings reveal divisions over gun issue in religious communities

After a gunman killed 19 children and two teachers at an elementary school in Uvalde, …