Tag Archives: NYTimes

Have bad graphs and faulty analysis led to evidence that Amazon has fake reviewers? Read on…

In my first post about Nick Bilton’s flawed analysis of the Amazon’s Kindle I left a few questions unanswered. One of those questions had to do with the ratings of the reviewers themselves. Since Amazon allows each review to be rated by anyone, it might be interesting to see if the number of people who found a review useful varied by the number of stars the reviewer gave to the Kindle. So I ran an analysis examining Kindle 2 reviews.

So here are 4 plots*. The first shows all reviews. Along the horizontal axis is the number of people reported to have found the review useful. Along the vertical axis is the star rating of the review. The plot on the upper right shows the same distribution, but for non-verified purchasers of Kindle2 only. The plot on the lower left shows the same distribution, but this time for reviewers who Amazon said actually purchased a Kinde2. The plot on the lower right brings the Amazon verified and Amazon non-verified purchasers together. Each red + sign is an Amazon Verified purchaser and each blue circle is a non-verified purchaser.

Four scatterplots

Evidence of fake reviews?

These four charts tell us an interesting story. Each point on the chart represents a review. So in each chart (except on the bottom right**) you’re seeing 9,212 points. The two charts on top are roughly the same. That’s because the first chart shows all reviews and the second one shows just the reviews submitted by non-verified Kindle2 purchases. You may recall that 75% of the reviews on the Kindle2 were submitted by people who Amazon said didn’t buy a Kindle2. So those dots dominate the charts. But take a look at the chart on the bottom left. You’ll notice that the cluster of reviews at the bottom of top two charts, the ones between 1 and 2 stars and stretching out all the way to the end of the X axis are gone. We knew that the non-verified purchasers were four times more likely to give a one star review compared to a verified purchaser, but we didn’t know that the 1 star non-verified reviewer were getting lots of people finding their reviews useful.

This dynamic really pops in the bottom right hand chart, the one with the red and blue lines in it. The blue line is made up of non-verified purchasers. As the number of people who said they found the review useful increases (starting around 8), the line dives down towards the 1-2 star ratings. The downward slope of the curve for the verified purchasers is much, much gentler.

This is a bit of a head-scratcher. I’ve heard people say that Amazon is full of fake reviews. These people aren’t saying that Amazon is the one doing the faking, but people who have some product that competes against the product being reviewed, or just people with an axe to grind. Is this an example of that? Do the fakers get their friends to say that their reviews are helpful? Maybe the Kindle2 verified purchasers post reviews that people just don’t find helpful. Right now, I don’t know what the correct answer is. But I have a feeling that some intelligent text-mining of the data will help flesh out an answer. Be on the lookout for a post about just that topic, by Marc Harfeld, coming soon, right here.

*To make the graphs easier to decipher I’ve excluded any review with more than 50 people finding the review useful. Taking the horizontal axis beyond 50 makes the plot very difficult to read. In all, this amounts to excluding 92 reviews out of the 9,304 I have gathered on the Kindle2. Because the star ratings are integers between 1 and 5, I needed to introduce a random jitter to the points (1 star becomes 1.221, another 1 star becomes 1.1321) so that they wouldn’t completely overlap each other on the scatterplot. I did the same to the values of how many people found each review helpful.
**Please note, to make an apples to apples comparison for chart on the bottom right, I had to reduce the number of non-verified reviewers down to the same number of amazon-verified reviewers. The sampling was a simple random sample, so it did not distort the distribution.