Predicting when online conversations turn toxic

By Melanie Lefkowitz

July 12, 2018

The internet offers the potential for constructive dialogue and cooperation, but online conversations too often degenerate into personal attacks. In hopes that those attacks can be averted, Cornell researchers have created a model to predict which civil conversations might take a turn and derail.

After analyzing hundreds of exchanges between Wikipedia editors, the researchers developed a computer program that scans for warning signs in the language used by participants at the start of a conversation – such as repeated, direct questioning or use of the word “you” – to predict which initially civil conversations would go awry.

Early exchanges that included greetings, expressions of gratitude, hedges such as “it seems,” and the words “I” and “we” were more likely to remain civil, the study found.

“There are millions of such discussions taking place every day, and you can’t possibly monitor all of them live. A system based on this finding might help human moderators better direct their attention,” said Cristian Danescu-Niculescu-Mizil, assistant professor of information science and co-author of the paper “Conversations Gone Awry: Detecting Early Signs of Conversational Failure.”

“We, as humans, have an intuition of whether a conversation is about to go awry, but it’s often just a suspicion. We can’t do it 100 percent of the time. We wonder if we can build systems to replicate or even go beyond this intuition,” Danescu-Niculescu-Mizil said.

The computer model, which also considered Google’s Perspective, a machine-learning tool for evaluating “toxicity,” was correct around 65 percent of the time. Humans guessed correctly 72 percent of the time.

People can test their own ability to guess which conversations will derail at an online quiz.

The study analyzed 1,270 conversations that began civilly but degenerated into personal attacks, culled from 50 million conversations across 16 million Wikipedia “talk” pages, where editors discuss articles or other issues. They examined exchanges in pairs, comparing each conversation that ended badly with one that succeeded on the same topic, so the results weren’t skewed by sensitive subject matter such as politics.

The paper, co-written with Cornell Ph.D. information science student Justine Zhang; Ph.D. computer science students Jonathan P. Chang, and Yiqing Hua; Lucas Dixon and Nithum Thain of Jigsaw; and Dario Taraborelli of the Wikimedia Foundation, will be presented at the Association for Computational Linguistics’ annual meeting, from July 15 to 20 in Melbourne, Australia.

The researchers hope this model can be used to rescue at-risk conversations and improve online dialogue, rather than for banning specific users or censoring certain topics. Some online posters, such as nonnative English speakers, may not realize they could be perceived as aggressive, and nudges from such a system could help them self-adjust.

“If I have tools that find personal attacks, it’s already too late, because the attack has already happened and people have already seen it,” Chang said. “But if you understand this conversation is going in a bad direction and take action then, that might make the place a little more welcoming.”

Computing & Information Sciences