Racists are locked in a linguistic arms race with software designed to trawl the web and pick out hateful keywords.
When Alphabet imprint Jigsaw released an algorithm last year to pick out known slurs and other hateful messages online, racists responded with a campaign that replaced racial epithets with the names of Google products. During Operation Google, as it was dubbed, Jews became “skypes,” and black folk became “googles.” By replacing slurs with the company’s own product names, they thought they’d outsmarted the software.
Now, researchers at the University of Rochester have developed an algorithm that can identify hateful tweets using these same codes. In practice, this means that it can tell a tweet saying something like “gas the skypes” is hateful—whereas “I hate Skype” isn’t—with roughly 80 percent accuracy. The researchers also hope that this algorithm could, with some work, track the evolving set of codes that racists use to harass people online.
Videos by VICE
The trick, according to a paper that will be presented in May at the International Conference on Web and Social Media in Montreal, is to use AI to “learn” all the words that tend to go along with these hateful codes. For example, “#MAGA,” “gas,” or “white.”
Read More: Inside Wikipedia’s Attempt to Use Artificial Intelligence to Combat Harassment
“We essentially gathered hateful tweets and used language processing to find the other terms that were associated with such messages,” Jiebo Luo, a professor of computer science and co-author of the paper, told me in an interview. “We learned these terms and used them as the bridge to new terms—as long as we have those words, we have a link to anything they can come up with. That’s the key.”
This defeats attempts to conceal racist slurs using codes by targeting the language that makes up the cultural matrix from which the hate emerges, instead of just seeking out keywords. Even if the specific slurs used by racists change in order to escape automated comment moderation, the other terms they use to identify themselves and their communities likely won’t.
This idea has potential, especially considering the state of AI for identifying hate speech today. For example, when you type something wildly hateful, such as “gas the jews,” into Jigsaw’s web app to test its AI, it’s marked as 81 percent likely to be viewed as toxic. But if you type in “gas the skypes,” Jigsaw’s AI simply tells you it needs more data to evaluate the statement. Luo and his colleagues’ algorithm, on the other hand, may have caught this coded hateful message.
As for what comes next, Luo said the team will be working on making the algorithm even more accurate, which will require a lot more Twitter data to continue to train the algorithm.
“Our goal is to use data science for social good, so we certainly hope that companies like Facebook and Twitter can pick up our research,” Luo said.
Subscribe to pluspluspodcast , Motherboard’s new show about the people and machines that are building our future.