force requests to use IPv4
Problem
I have a script that periodically calls the API of a server to fetch some data. It worked well under Manjaro and Ubuntu. However, after a system update, the script stopped working on Ubuntu.
Diagnosis
It turned out that requests.get()
couldn’t connect to the server. I tried to ping the host and under Ubuntu ping resolved an IPv6 address and it was unreachable. You can force ping to use IPv4 with the “-4” switch (ex.: “ping example.com -4
“). Under Manjaro ping resolved an IPv4 address by default and the Python script worked well. Under Ubuntu, however, requests.get()
wanted to use IPv6 and for some reason the given host was not reachable through that protocol.
Solution
In my Python code I used the following patch to force the usage of IPv4. requests
relies on a lower level library, urllib3
, thus the urllib3
part had to be patched:
import socket import requests.packages.urllib3.util.connection as urllib3_cn def allowed_gai_family(): family = socket.AF_INET # force IPv4 return family urllib3_cn.allowed_gai_family = allowed_gai_family
It solved the issue under Ubuntu. This tip is from here.
requests-transition
With the package requests-transition you can use requests 0.x and the shiny new requests 1.x too. Requests 1.x changed some things, so if you don’t want to update your code, requests-transition
can be useful.
After installation, you can simply select which version of requests to use:
-
import requests0 as requests
(0.x) -
import requests1 as requests
(1.x)
Where does a page redirect to?
Question
We have a page that redirects to another page. How to figure out where the redirection points to?
Answer
import urllib s = "https://pythonadventures.wordpress.com?random" # returns a random post page = urllib.urlopen(s) print page.geturl() # e.g. http:// pythonadventures.wordpress.com/2010/10/08/python-challenge-1/
Credits
I found it in this thread.
Update (20121202)
With requests:
>>> import requests >>> r = requests.get('https://pythonadventures.wordpress.com?random') >>> r.url u'https://pythonadventures.wordpress.com/2010/09/30/create-import-module/'
Get URL info (file size, Content-Type, etc.)
Problem
You have a URL and you want to get some info about it. For instance, you want to figure out the content type (text/html, image/jpeg, etc.) of the URL, or the file size without actually downloading the given page.
Solution
Let’s see an example with an image. Consider the URL http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg .
#!/usr/bin/env python import urllib def get_url_info(url): d = urllib.urlopen(url) return d.info() url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg' print get_url_info(url)
Output:
Date: Mon, 18 Oct 2010 18:58:07 GMT
Server: Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6
X-Powered-By: Zope (www.zope.org), Python (www.python.org)
Last-Modified: Thu, 08 Nov 2007 09:56:19 GMT
Content-Length: 103984
Accept-Ranges: bytes
Connection: close
Content-Type: image/jpeg
That is, the size of the image is 103,984 bytes and its content type is indeed image/jpeg.
In the code d.info()
is a dictionary, so the extraction of a specific field is very easy:
#!/usr/bin/env python import urllib def get_content_type(url): d = urllib.urlopen(url) return d.info()['Content-Type'] url = 'http://'+'www'+'.geos.ed.ac.uk'+'/homes/s0094539/remarkable_forest.preview.jpg' print get_content_type(url) # image/jpeg
This post is based on this thread.
Update (20121202)
With requests:
>>> import requests >>> from pprint import pprint >>> url = 'http://www.geos.ed.ac.uk/homes/s0094539/remarkable_forest.preview.jpg' >>> r = requests.head(url) >>> pprint(r.headers) {'accept-ranges': 'none', 'connection': 'close', 'content-length': '103984', 'content-type': 'image/jpeg', 'date': 'Sun, 02 Dec 2012 21:05:57 GMT', 'etag': 'ts94515779.19', 'last-modified': 'Thu, 08 Nov 2007 09:56:19 GMT', 'server': 'Apache/2.0.63 (Unix) mod_ssl/2.0.63 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_fastcgi/2.4.6', 'x-powered-by': 'Zope (www.zope.org), Python (www.python.org)'}
check if URL exists
Problem
You want to check if a URL exists without actually downloading the given file.
Solution
Update (20120124): There was something wrong with my previous solution, it didn’t work correctly. Here is my revised version.
import httplib import urlparse def get_server_status_code(url): """ Download just the header of a URL and return the server's status code. """ # http://stackoverflow.com/questions/1140661 host, path = urlparse.urlparse(url)[1:3] # elems [1] and [2] try: conn = httplib.HTTPConnection(host) conn.request('HEAD', path) return conn.getresponse().status except StandardError: return None def check_url(url): """ Check if a URL exists without downloading the whole file. We only check the URL header. """ # see also http://stackoverflow.com/questions/2924422 good_codes = [httplib.OK, httplib.FOUND, httplib.MOVED_PERMANENTLY] return get_server_status_code(url) in good_codes
Tests:
assert check_url('http://www.google.com') # exists assert not check_url('http://simile.mit.edu/crowbar/nothing_here.html') # doesn't exist
We only get the header of a given URL and we check the response code of the web server.
Update (20121202)
With requests:
>>> import requests >>> >>> url = 'http://hup.hu' >>> r = requests.head(url) >>> r.status_code 200 # requests.codes.OK >>> url = 'http://www.google.com' >>> r = requests.head(url) >>> r.status_code 302 # requests.codes.FOUND >>> url = 'http://simile.mit.edu/crowbar/nothing_here.html' >>> r = requests.head(url) >>> r.status_code 404 # requests.codes.NOT_FOUND