Archive

Posts Tagged ‘requests’

force requests to use IPv4

June 28, 2019 Leave a comment

Problem
I have a script that periodically calls the API of a server to fetch some data. It worked well under Manjaro and Ubuntu. However, after a system update, the script stopped working on Ubuntu.

Diagnosis
It turned out that requests.get() couldn’t connect to the server. I tried to ping the host and under Ubuntu ping resolved an IPv6 address and it was unreachable. You can force ping to use IPv4 with the “-4” switch (ex.: “ping example.com -4“). Under Manjaro ping resolved an IPv4 address by default and the Python script worked well. Under Ubuntu, however, requests.get() wanted to use IPv6 and for some reason the given host was not reachable through that protocol.

Solution
In my Python code I used the following patch to force the usage of IPv4. requests relies on a lower level library, urllib3, thus the urllib3 part had to be patched:

import socket
import requests.packages.urllib3.util.connection as urllib3_cn

def allowed_gai_family():
    family = socket.AF_INET    # force IPv4
    return family

urllib3_cn.allowed_gai_family = allowed_gai_family

It solved the issue under Ubuntu. This tip is from here.

Categories: python Tags: , , , , ,

requests-transition

January 7, 2013 Leave a comment

With the package requests-transition you can use requests 0.x and the shiny new requests 1.x too. Requests 1.x changed some things, so if you don’t want to update your code, requests-transition can be useful.

After installation, you can simply select which version of requests to use:

  • import requests0 as requests
    (0.x)
  • import requests1 as requests
    (1.x)
Categories: python Tags: , , ,

Where does a page redirect to?

December 21, 2010 Leave a comment

Question

We have a page that redirects to another page. How to figure out where the redirection points to?

Answer

import urllib

s = "https://pythonadventures.wordpress.com?random"    # returns a random post
page = urllib.urlopen(s)
print page.geturl()    # e.g. http:// pythonadventures.wordpress.com/2010/10/08/python-challenge-1/

Credits

I found it in this thread.

Update (20121202)

With requests:

>>> import requests
>>> r = requests.get('https://pythonadventures.wordpress.com?random')
>>> r.url
u'https://pythonadventures.wordpress.com/2010/09/30/create-import-module/'
Categories: python Tags: , ,

Get URL info (file size, Content-Type, etc.)

October 18, 2010 2 comments

Problem

You have a URL and you want to get some info about it. For instance, you want to figure out the content type (text/html, image/jpeg, etc.) of the URL, or the file size without actually downloading the given page.

Solution

Let’s see an example with an image. Consider the URL http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg .

#!/usr/bin/env python

import urllib

def get_url_info(url):
    d = urllib.urlopen(url)
    return d.info()

url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg'
print get_url_info(url)

Output:

Date: Mon, 18 Oct 2010 18:58:07 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6
X-Powered-By: Zope (www.zope.org), Python (www.python.org)
Last-Modified: Thu, 08 Nov 2007 09:56:19 GMT
Content-Length: 103984
Accept-Ranges: bytes
Connection: close
Content-Type: image/jpeg

That is, the size of the image is 103,984 bytes and its content type is indeed image/jpeg.

In the code d.info() is a dictionary, so the extraction of a specific field is very easy:

#!/usr/bin/env python

import urllib

def get_content_type(url):
    d = urllib.urlopen(url)
    return d.info()['Content-Type']

url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg'
print get_content_type(url)    # image/jpeg

This post is based on this thread.

Update (20121202)

With requests:

>>> import requests
>>> from pprint import pprint
>>> url = 'http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg'
>>> r = requests.head(url)
>>> pprint(r.headers)
{'accept-ranges': 'none',
 'connection': 'close',
 'content-length': '103984',
 'content-type': 'image/jpeg',
 'date': 'Sun, 02 Dec 2012 21:05:57 GMT',
 'etag': 'ts94515779.19',
 'last-modified': 'Thu, 08 Nov 2007 09:56:19 GMT',
 'server': 'Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6',
 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)'}

check if URL exists

October 17, 2010 6 comments

Problem

You want to check if a URL exists without actually downloading the given file.

Solution

Update (20120124): There was something wrong with my previous solution, it didn’t work correctly. Here is my revised version.

import httplib
import urlparse

def get_server_status_code(url):
    """
    Download just the header of a URL and
    return the server's status code.
    """
    # http://stackoverflow.com/questions/1140661
    host, path = urlparse.urlparse(url)[1:3]    # elems [1] and [2]
    try:
        conn = httplib.HTTPConnection(host)
        conn.request('HEAD', path)
        return conn.getresponse().status
    except StandardError:
        return None

def check_url(url):
    """
    Check if a URL exists without downloading the whole file.
    We only check the URL header.
    """
    # see also http://stackoverflow.com/questions/2924422
    good_codes = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY]
    return get_server_status_code(url) in good_codes

Tests:

assert check_url('http://www.google.com')    # exists
assert not check_url('http://simile.mit.edu/crowbar/nothing_here.html')    # doesn't exist

We only get the header of a given URL and we check the response code of the web server.

Update (20121202)

With requests:

>>> import requests
>>>
>>> url = 'http://hup.hu'
>>> r = requests.head(url)
>>> r.status_code
200    # requests.codes.OK
>>> url = 'http://www.google.com'
>>> r = requests.head(url)
>>> r.status_code
302    # requests.codes.FOUND
>>> url = 'http://simile.mit.edu/crowbar/nothing_here.html'
>>> r = requests.head(url)
>>> r.status_code
404    # requests.codes.NOT_FOUND
Categories: python Tags: , ,