In 'Six Degrees of Reputation' Cornell authors track plagiarism and abuse of online reviewing systems

Just how many well-researched, magisterial, luminous, lucid, engaging, eponymous new books and CDs can there really be anyway? Judging by user postings in online review systems like Amazon.com, thousands. Positive bias in online consumer reviews has become almost standard industry practice, but plagiarizing user reviews and passing them off as authentic is another animal altogether, says a new Cornell University study that has been tracking that other animal.

"We were interested in the ways that the users abuse a technical system and the way that the system affords, allows, encourages, makes harder or prevents such abuse," said Shay David, a Cornell graduate student in the Department of Science and Technology Studies (S&TS). David, who co-authored the study with S&TS Professor Trevor Pinch, also built software to investigate user practices in online product reviews at several leading commercial sites -- primarily Amazon.com. The study, "Six Degrees of Reputation: The Use and Abuse of Online Recommendation Systems," is available at the Social Science Research Network (SSRN) Web site: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=857505. SSRN is a non-peer-reviewed repository.

The study explores several cases in which book and CD reviews were copied in part or in whole and documents "hundreds, possibly thousands of cases of copying amongst Amazon.com's book and CD reviews," said Pinch.

He added that he and David "do not see this as a paper attacking Amazon so much as hopefully improving the online reviewing system. We suspect this copying occurs on most other product review sites. We just happened to start with Amazon."

On Dec. 12, Amazon's online media service did not reply to a request from the Cornell News Service for comments on the study.

The authors say they hope to gain a better understanding of the system as a whole, "not only where it 'fails' but also to get a sense of its potential when it functions properly."

Online review systems have increasingly influenced cultural commerce in an area where traditional experts and critics once held sway as objective guardians of quality and artistic merit. The result, however, is not a democratic exchange or critical dialogue, but the possible creation of a "cultural Lake Woebegon where 'all books are above average,'" the authors state.

It all started when Pinch discovered several "positive" online user reviews had been copied from one of his own co-authored books, "Analog Days: The Invention and Impact of the Moog Synthesizer" (Pinch and Trocco, 2002). Pinch had found the reviews posted with different names and e-mail addresses for a similar book by another author, Ben Kettlewell, "Electronic Music Pioneers."

He was dismayed that such "blatant copying by possibly nonexistent readers" was allowed, the study states. Reviewers gain reputation on Amazon by the sheer volume of their reviews, and their review rankings, which readers give to indicate whether or not a review was helpful. Eventually an Amazon reviewer with many high-ranked reviews will be designated an Amazon top 100 reviewer, Pinch explained.

But the basic question driving "Six Degrees of Reputation" is: Can these user reviews be counted on at all?

"Clearly the electronic media allow perfect copies of both primary and secondary information goods," the study states. "A preliminary literature search … suggested that many reviews are not authentic, that users employ various techniques to game the system and that the phenomena [is widespread]."

Indeed, that seems to be what they found -- and it's way beyond the silly business of dust jacket accolades written by friends of the author. How they discovered the depth of reuse abuse is an elegant instance of technological detective work.

Working with Pinch, David designed software that could evaluate the prevalence of review copying, plagiarism and abuse and detect text reuse, David said.

"The first task for the software was to identify which books or CDs might have reused reviews," said David, "and then compare items that are somewhat similar to one another."

For similarity criteria, the authors used a public application programming interface (API) available from Amazon that uses customers' past purchasing behavior to deduce similarity among the items they sell to project customers' interests. For example, the method found Pinch and Troco's "Analog Days" and Kettlewell's "Electronic Music Pioneers" to be similar.

The researchers were then able to build virtual graphs representing how similar user reviews were to an original book or CD review.

The next step was to compare the reviews to one another. For that, the authors made use of a modified version of plagiarism tracking software written by Daria Sorokina of Cornell's Department of Computer Science. Sorokina's algorithm looks for text reuse at the sentence level and produces lists of reused texts ranked by the amount of similar text and the probability of copying. "Importantly, the algorithm is able to detect reuse of text even if the reuse order within the paragraph is different," said David.

"Six Degrees of Reputation" eventually will be published in a collection with MIT Press that Pinch is co-editing with Richard Swedberg, Cornell professor of sociology. But Pinch decided to release an online version earlier because of its topicality. He does not anticipate too many positive reviews -- at least not from the online information industry.

Pinch and David say they seek two outcomes as a result of their work, one immediate and another in the future.:

  • That Internet sellers that solicit reviews -- especially the world's largest like Amazon -- take the issue of plagiarism seriously.
  • In the long term, they hope to gain a better understanding of how the world of book reviewing is changing with user reviews via the Internet.

Media Contact

Media Relations Office