Movie Recommendation: Should I trust IMDb?

recommendation| data science| data | | Jean-Baptiste Rouquier

Nice comments about the top 12 secret shortcuts of Dataiku DSS convinced me to write another one: this time it's about plugins, APIs, movies, and how to choose the best one to watch with friends.

Too much data news

Choose A Good Movie

A friend of mine has a list of all her DVDs (and takes note of who borrowed what ;). How do I choose the top 5 movies I'd like to borrow?

I'm not a movie maniac and I don't know most of the titles in her collection. And when I do know one, it's most often because I have already seen it (and would prefer to watch a new one). So I need another source than my culture for choosing a movie.

There is a way: I have noticed that I often agree with IMDb and Metacritic ratings. So the question is now this: I'd like to get ratings from various sites for all those DVDs and filter them to keep only the highest rated ones.

Once the question is well defined, first thing is getting the data. I found two nice free APIs: OMDb and TMDb.

  • OMDb provides IMDb, Metacritic and Rotten Tomatoes scores.
  • TMDb is better at matching approximate and foreign titles.

I coded a quick plugin to request those APIs and got this flow:

Movie recommendation flow

The red recipes are the ones created thanks to the plugin: they request OMDb/TMDb and enrich DVDs_list with movie ratings. The yellow recipe joins all info into the final dataset.

Now I just need to open the resulting dataset, use DSS filter to keep high enough ratings, and voilà, a list of the first movies I'd like to watch with my friend.

As a bonus, a column contains the runtime,to help us choose a short or lenghty movie. I also like the “tomato consensus”, which tells us where the movie shines (deep characters, gorgeous scenery, great jokes...) without any spoilers.

Choose a recommendation site

Data science workflow

Quick question: which site has the best ratings? Or, more precisely, where are the ratings most useful to me?

With the above tools, it would be a shame not to run some cool stats. Let's see if we can gain some insights from those numbers.

I have rated most movies I've seen on (their offer is to give you a movie rating based on ratings by people like you). To get my data back, I used the plugin (not detailed in this post). Then I ran the same flow: augmented this list of films with info from TMDb and OMDb. This yields a dataset of movies with several rating sources: IMDb, Metacritic, RottenTomatoes, TMDb users... and me. ;)

Can we find some correlations?


  • As could be expected, highest correlations are between tomatoMeter and tomatoRating: the meter is a percentage of positive reviews, while the rating is an average of all reviews. So they contain very similar information.
  • Metascore and Tomato are pretty highly correlated too... and they indeed have the same kind of sources: they both aggregate lots of professional reviews.
  • Next highest one is tomatoUser to IMDb: they are both an aggregate of lots of users (not pro) scores.
  • They correlate much less with Senscritique users. I guess the French taste differs from the mainstream taste on American sites. ;)

On the following more detailed plot, we see that IMDb never rates below five, while senscritiq is more demanding and uses most possible ratings (TMDb essentially uses 3 different ratings).

Pandas visualization

For quite some time now, I have been checking IMDb ratings before watching a movie, and I'm probably unconsciously influenced by it: if I'm told a movie is good, I will tend to like it more... so the correlation between my score and IMDb ratings is probably slightly biased.

On the other hand, I take great care of not reading the user scores on Senscritique before giving my own score to avoid any bias. It's interesting to see that my taste correlates most with Senscritique.

Finally, my personal takeway is that Rotten Tomatoes might be closer to my taste than Metacritc. I will pay more attention to it from now on.

Want To Do The Same With Your Own Movies?



Other Content You May Like