check if URL exists


You want to check if a URL exists without actually downloading the given file.


Update (20120124): There was something wrong with my previous solution, it didn’t work correctly. Here is my revised version.

import httplib
import urlparse

def get_server_status_code(url):
    Download just the header of a URL and
    return the server's status code.
    # http://stackoverflow.com/questions/1140661
    host, path = urlparse.urlparse(url)[1:3]    # elems [1] and [2]
        conn = httplib.HTTPConnection(host)
        conn.request('HEAD', path)
        return conn.getresponse().status
    except StandardError:
        return None

def check_url(url):
    Check if a URL exists without downloading the whole file.
    We only check the URL header.
    # see also http://stackoverflow.com/questions/2924422
    good_codes = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY]
    return get_server_status_code(url) in good_codes


assert check_url('http://www.google.com')    # exists
assert not check_url('http://simile.mit.edu/crowbar/nothing_here.html')    # doesn't exist

We only get the header of a given URL and we check the response code of the web server.

Update (20121202)

With requests:

>>> import requests
>>> url = 'http://hup.hu'
>>> r = requests.head(url)
>>> r.status_code
200    # requests.codes.OK
>>> url = 'http://www.google.com'
>>> r = requests.head(url)
>>> r.status_code
302    # requests.codes.FOUND
>>> url = 'http://simile.mit.edu/crowbar/nothing_here.html'
>>> r = requests.head(url)
>>> r.status_code
404    # requests.codes.NOT_FOUND
  1. Ice Walker
    April 11, 2012 at 11:03

    might add “import urlparse” at the beginning of your script to remove some pre-compiling errors. NICE script which is working, thank you.

  2. November 7, 2012 at 15:15

    thanks! one of the few that actually work!

  3. March 27, 2013 at 12:33

    And this is what I was looking for!
    The beatiful of Python, the script runs perfect today. It works just with a copy & paste. Thank you

  4. Script hunter
    March 17, 2018 at 17:07

    But when website does not exist, your r = requests.head(url) throw a error rather than a status code

