Home > python > check if URL exists

check if URL exists

Problem

You want to check if a URL exists without actually downloading the given file.

Solution

Update (20120124): There was something wrong with my previous solution, it didn’t work correctly. Here is my revised version.

import httplib
import urlparse

def get_server_status_code(url):
    """
    Download just the header of a URL and
    return the server's status code.
    """
    # http://stackoverflow.com/questions/1140661
    host, path = urlparse.urlparse(url)[1:3]    # elems [1] and [2]
    try:
        conn = httplib.HTTPConnection(host)
        conn.request('HEAD', path)
        return conn.getresponse().status
    except StandardError:
        return None

def check_url(url):
    """
    Check if a URL exists without downloading the whole file.
    We only check the URL header.
    """
    # see also http://stackoverflow.com/questions/2924422
    good_codes = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY]
    return get_server_status_code(url) in good_codes

Tests:

assert check_url('http://www.google.com')    # exists
assert not check_url('http://simile.mit.edu/crowbar/nothing_here.html')    # doesn't exist

We only get the header of a given URL and we check the response code of the web server.

Update (20121202)

With requests:

>>> import requests
>>>
>>> url = 'http://hup.hu'
>>> r = requests.head(url)
>>> r.status_code
200    # requests.codes.OK
>>> url = 'http://www.google.com'
>>> r = requests.head(url)
>>> r.status_code
302    # requests.codes.FOUND
>>> url = 'http://simile.mit.edu/crowbar/nothing_here.html'
>>> r = requests.head(url)
>>> r.status_code
404    # requests.codes.NOT_FOUND
About these ads
Categories: python Tags: , ,
  1. Ice Walker
    April 11, 2012 at 11:03 | #1

    might add “import urlparse” at the beginning of your script to remove some pre-compiling errors. NICE script which is working, thank you.

  2. November 7, 2012 at 15:15 | #3

    thanks! one of the few that actually work!

  3. March 27, 2013 at 12:33 | #4

    And this is what I was looking for!
    The beatiful of Python, the script runs perfect today. It works just with a copy & paste. Thank you

  1. No trackbacks yet.
You must be logged in to post a comment.
Follow

Get every new post delivered to your Inbox.

Join 61 other followers

%d bloggers like this: