Online star ratings, such as those found on Amazon, eBay and YouTube, seem like a great way to harness the wisdom of crowds to gauge the relative quality of a product or item on the web. After all, who wants to rely on a handful of gnarled critics dispensing their opinions from on high when you can see what thousands of users think?
It’s a great idea in principle, but unfortunately, it doesn’t seem to work. And no, not because of all the know-nothing trolls out there who saturate these ratings channels with their narcissistic critiques. The problem is that, when it comes to rating products online, the crowd is too nice.
Last week’s edition of WNYC’s excellent On The Media podcast featured an interesting interview with Wall Street Journal reporter Geoffrey Fowler discussing an article he co-authored recently, On the Internet, Everyone’s a Critic But They’re Not Very Critical (subscription only).
In the interview, Fowler said:
“We tend to think of the Internet as a place filled with mean blogs about celebrities and email flame wars. But there is, in fact, this growing corner of the Web where people have a problem that they tend to be just too positive. When you give people the option between 1 star and 5 stars, most things average out to about 4.3, and some particular categories of products tend to average even higher. Like if you look at the average ratings for dog food, that’s like a 4.8.”
I’m not much of an online rater, but after hearing this it occurred to me that whenever I do rate something — the odd video on YouTube — I always give a five-star rating. Or, to look at it another way, I only bother to rate that which I deem exceptional, as though the fifth star is a thumbs-up rating and the other four stars don’t exist. And obviously I’m not alone.
Fowler also cites research from the University of Toronto that found people generally give negative reviews at the same rate, but those who regard themselves as experts on a given topic were far more inclined to give positive reviews.
So it’s those experts again who are skewing the ratings and polluting the stream in an obsessive effort to demonstrate their expertise. Guess I’m off the hook.
Whatever the cause, this is a problem for the sites hosting the star ratings and for users who rely on them (or at least used to until they realised the ratings were inflated and largely useless).
In 2007, eBay introduced “Detailed Seller Ratings” (DSR), which used average star ratings to determine a seller’s overall rating. In 2008, eBay acknowledged the star-rating inflation in its midst and increased the minimum average star rating a seller required to be classified as an acceptable seller to 4.3. This year, it gave up trying to use average ratings altogether and started focusing on the number of one or two star ratings a seller receives. In other words, positive ratings are useless. Only negative experiences are of value.
Similarly, YouTube published a blog post a few weeks ago admitting that its star ratings were largely redundant, due to the overwhelming proportion of five star ratings, and asking for feedback on alternatives.
YouTube’s high-five star rating breakdown
Suggested replacements include tracking more revealing behavioural metrics such as how much of a video people actually watch, how viral it goes on social media sites like Twitter and Facebook, or even switching to a simple Digg-like thumbs up or down system or an even simpler single-option approval system (since that’s effectively how YouTube’s current star-rating system works anyway).
So if the trusty online star-rating system is broken, where does this leave users and proprietors?
As you can imagine, with so many livelihoods at stake, eBay sellers have been quick to voice their displeasure with the company’s less than perfect ratings system. Amazon sellers are also quite active. However, when there are no sellers involved and the only thing at stake for the user is an augmented surfing experience that they have never enjoyed so can’t miss, ratings systems can slip down the priority list of user-generated content sites like YouTube.
Then again, how users are rated — whether by other users or a fancy algorithm — is not something to be trifled with. Many Flickr users were indignant when in 2005 the photo-sharing site introduced a mysterious “interestingness” rating to identify the “most interesting” photos each day. It was Flickr’s way of elevating itself above several lowest common denominators, such as “most viewed” and “most favourited”, that dumb down the popular lists of other sites.
No lives were ruined as a result, you might assume. But many of the site’s users were up in arms about interestingness because, of course, the whole concept of what is “interesting” is subjective (especially in artistic circles).
Also, the Flickr photos with the most “interestingness” juice are featured on the site’s popular Explore page. And featuring on Explore boosts a photographer’s exposure, which boosts her brand, which can obviously deliver sales and work. So perhaps the amount of interest users show in a rating system correlates directly with their hip-pocket stake in the outcome.
If that’s the case, the perfect ratings solution to replace the fluffy five-star-fest we currently have is likely to come from a service that has both users and sellers joined at the hip.
Paul Ryan is Editor of Anthill magazine.