Posts Tagged ‘rotten tomatoes’

Get the RottenTomatoes rating of a movie

March 26, 2011 1 comment


In the previous post we saw how to extract the IMDB rating of a movie. Now let’s see the same thing with the RottenTomatoes website. Their rating looks like this:


Download link: Source code:

#!/usr/bin/env python

# RottenTomatoesRating
# Laszlo Szathmary, 2011 (

from BeautifulSoup import BeautifulSoup
import sys
import re
import urllib
import urlparse

class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20110303 Firefox/3.6.15'

class RottenTomatoesRating:
    # title of the movie
    title = None
    # RT URL of the movie
    url = None
    # RT tomatometer rating of the movie
    tomatometer = None
    # RT audience rating of the movie
    audience = None
    # Did we find a result?
    found = False

    # for fetching webpages
    myopener = MyOpener()
    # Should we search and take the first hit?
    search = True

    # constant
    BASE_URL = ''
    SEARCH_URL = '%s/search/full_search.php?search=' % BASE_URL

    def __init__(self, title, search=True):
        self.title = title = search

    def _search_movie(self):
        movie_url = ""

        url = self.SEARCH_URL + self.title
        page =
        result ='(/m/.*)', page.geturl())
        if result:
            # if we are redirected
            movie_url =
            # if we get a search list
            soup = BeautifulSoup(
            ul = soup.find('ul', {'id' : 'movie_results_ul'})
            if ul:
                div = ul.find('div', {'class' : 'media_block_content'})
                if div:
                    movie_url = div.find('a', href=True)['href']

        return urlparse.urljoin( self.BASE_URL, movie_url )

    def _process(self):
        if not
            movie = '_'.join(self.title.split())

            url = "%s/m/%s" % (self.BASE_URL, movie)
            soup = BeautifulSoup(
            if soup.find('title').contents[0] == "Page Not Found":
                url = self._search_movie()
            url = self._search_movie()

            self.url = url
            soup = BeautifulSoup( )
            self.title = soup.find('meta', {'property' : 'og:title'})['content']
            if self.title: self.found = True

            self.tomatometer = soup.find('span', {'id' : 'all-critics-meter'}).contents[0]
            self.audience = soup.find('span', {'class' : 'meter popcorn numeric '}).contents[0]

            if self.tomatometer.isdigit():
                self.tomatometer += "%"
            if self.audience.isdigit():
                self.audience += "%"

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print "Usage: %s 'Movie title'" % (sys.argv[0])
        rt = RottenTomatoesRating(sys.argv[1])
        if rt.found:
            print rt.url
            print rt.title
            print rt.tomatometer
            print rt.audience


The constructor has an optional parameter, which is True by default (search=True). It means that first we use the search function of the RT website and then we try to follow the first link. If search=False, the script tries to access the movie page directly. If it fails, then it falls back to the first case, i.e. it will try to find the movie via search.

Which version is better? It depends :) If there are several movies with the same title, then with search=True you will get the latest movie. If search=False, then you will usually get the oldest movie with that title.

For instance, for me “Star Wars” means episode 4, thus with the title “star wars”, search=False will return the relevant hit. But for “up in the air”, I would like to get the movie from 2009, not from 1940, thus in this case search=True would be better.

If you are in doubt, use the default case, i.e. search=True.

Related links

Update (20110329):

You will find the latest version of the script at

[ @reddit ]