Archive

Author Archive

Should I use Python 2 or Python 3?

February 4, 2011 Leave a comment

Should I use Python 2 or Python 3?

This is a very common question when someone wants to learn Python. Here is a nice article about this topic: http://wiki.python.org/moin/Python2orPython3.

(Thanks Jaume for the link.)

Update (20110404)

If you are ready to dive in Python 3, here are some tutorials:

  • The official Python 3 tutorial (HTML, PDF)
  • Further Python 3 docs (c-api.pdf, distutils.pdf, documenting.pdf, extending.pdf, faq.pdf, howto-advocacy.pdf, howto-cporting.pdf, howto-curses.pdf, howto-descriptor.pdf, howto-doanddont.pdf, howto-functional.pdf, howto-logging-cookbook.pdf, howto-logging.pdf, howto-pyporting.pdf, howto-regex.pdf, howto-sockets.pdf, howto-sorting.pdf, howto-unicode.pdf, howto-urllib2.pdf, howto-webservers.pdf, install.pdf, library.pdf, reference.pdf, tutorial.pdf, using.pdf, whatsnew.pdf)
  • Dive Into Python 3 (HTML and PDF)

Update (20110526)

I follow the following simple guideline: I use that version of Python that comes with Ubuntu by default. In Ubuntu 10.10 it was Python 2.6, in Ubuntu 11.04 it’s Python 2.7. When they switch to Python 3.x, I will switch too.

Where does a page redirect to?

December 21, 2010 Leave a comment

Question

We have a page that redirects to another page. How to figure out where the redirection points to?

Answer

import urllib

s = "https://pythonadventures.wordpress.com?random"    # returns a random post
page = urllib.urlopen(s)
print page.geturl()    # e.g. http:// pythonadventures.wordpress.com/2010/10/08/python-challenge-1/

Credits

I found it in this thread.

Update (20121202)

With requests:

>>> import requests
>>> r = requests.get('https://pythonadventures.wordpress.com?random')
>>> r.url
u'https://pythonadventures.wordpress.com/2010/09/30/create-import-module/'
Categories: python Tags: , ,

unicode to ascii

December 17, 2010 Leave a comment

Problem

I had the following unicode string: “Kellemes √únnepeket!” that I wanted to simplify to this: “Kellemes Unnepeket!”, that is strip “√ú” to “U”. Furthermore, most of the strings were normal ascii, only some of them were in unicode.

Solution

import unicodedata

title = ...   # get the string somehow
try:
    # if the title is a unicode string, normalize it
    title = unicodedata.normalize('NFKD', title).encode('ascii','ignore')
except TypeError:
    # if it was not a unicode string => OK, do nothing
    pass

Credits

I used the following resources:

Categories: python Tags: , , ,

Using MySQL from Python

December 14, 2010 Leave a comment

Problem

You want to interact with a MySQL database from your Python script.

Solution

First of all, you need to install the following package:

sudo apt-get install python-mysqldb

Then try the following basic script to check if everything is OK:

#!/usr/bin/env python

import MySQLdb

conn = MySQLdb.connect (host = "localhost",
                        user = "testuser",
                        passwd = "testpass",
                        db = "test")
cursor = conn.cursor ()
cursor.execute ("SELECT VERSION()")
row = cursor.fetchone ()
print "server version: ", row[0]
cursor.close ()
conn.close ()

Example:

We have a .csv file with two columns: symbol and name. Iterate through the lines and insert each line in a database table as a record.

#!/usr/bin/env python

import MySQLdb

f1 = open('./NYSE.csv',  'r')
# A line looks like this:
# ZLC;    Zale Corporation

conn = MySQLdb.connect(host = "localhost",
                       user = "user",
                       passwd = "passwd",
                       db = "table")
cursor = conn.cursor()

for line in f1:
    pieces = map(str.strip, line.split(';'))
    #print "'%s' => '%s'" % (pieces[0], pieces[1])
    query = "INSERT INTO symbol_name (symbol, name) VALUES (\"%s\", \"%s\")" % (pieces[0], pieces[1])
    #print query
    cursor.execute(query)

f1.close()

conn.commit()
cursor.close ()
conn.close ()

Links

There are lots of Python-MySQL tutorials on the net. Let’s see some of them:

Categories: python Tags:

Rename multiple files

November 2, 2010 Leave a comment

Problem

I scanned in 66 pages that are numbered from 11 to 76. However, the scanning software saved the files under the names Scan10032.JPG, Scan10033.JPG, …, Scan10097.JPG. I want to rename them to reflect the real numbering of the pages, i.e. 11.jpg, 12.jpg, …, 76.jpg.

Solution

#!/usr/bin/env python

import glob
import re
import os

files = glob.glob('*.JPG')  # get *.JPG in a list (not sorted!)
files.sort()                # sort the list _in place_
cnt = 11                    # start new names with 11.jpg

for f in files:
    original = f                                    # save the original file name
    result = re.search(r'Scan(\d+)\.JPG', f)        # pattern to match
    if result:                                      # Is there a match?
        new_name = str(cnt) + '.jpg'                # create the new name
        print "%s => %s" % (original, new_name)     # verify if it's OK
        # os.rename(original, new_name)             # then uncomment to rename
        cnt += 1                                    # increment the counter

Comments are inside the source code.

If you need a simpler rename (like removing a part of the file names), you can also use the rename command. In this post I give an example for that.

Categories: python Tags: , , , ,

Fluffy is gone

October 29, 2010 Leave a comment

We are sad to inform you that Fluffy, the world’s longest snake living in captivity, has died. 18-years-old and weighing 300-pounds Fluffy held the title of longest snake by Guinness World Records and was a hit attraction at Columbus Zoo.

Find more info here.

Categories: python Tags: ,

Levenshtein distance

October 19, 2010 Leave a comment

The Levenshtein distance (or edit distance) between two strings is the minimal number of “edit operations” required to change one string into the other. The two strings can have different lengths. There are three kinds of “edit operations”: deletion, insertion, or alteration of a character in either string.

Example: the Levenshtein distance of “ag-tcc” and “cgctca” is 3.

#!/usr/bin/env python

def LD(s,t):
    s = ' ' + s
    t = ' ' + t
    d = {}
    S = len(s)
    T = len(t)
    for i in range(S):
        d[i, 0] = i
    for j in range (T):
        d[0, j] = j
    for j in range(1,T):
        for i in range(1,S):
            if s[i] == t[j]:
                d[i, j] = d[i-1, j-1]
            else:
                d[i, j] = min(d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + 1)
    return d[S-1, T-1]

a = 'ag-tcc'
b = 'cgctca'

print LD(a, b)   # 3

The implementation is from here.

Categories: python Tags: ,

Hamming distance

October 19, 2010 Leave a comment

The Hamming distance is defined between two strings of equal length. It measures the number of positions with mismatching characters.

Example: the Hamming distance between “toned” and “roses” is 3.

#!/usr/bin/env python

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print hamming_distance(a, b)   # 3

If you need the number of matching character positions:

#!/usr/bin/env python

def similarity(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 == ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print similarity(a, b)    # 2

Actually this is equal to len(s1) - hamming_distance(s1, s2). Remember, len(s1) == len(s2).

More info on zip() here.

Categories: python Tags: , , ,

Permutations of a list

October 19, 2010 Leave a comment

Update (20120321): The methods presented here can generate all the permutations. However, the permutations are not ordered lexicographically. If you need the permutations in lexicographical order, refer to this post.

Problem

You need all the permutations of a list.

Solution

With generators:

#!/usr/bin/env python

def perms01(li):
    if len(li)         yield li
    else:
        for perm in perms01(li[1:]):
            for i in range(len(perm)+1):
                yield perm[:i] + li[0:1] + perm[i:]

for p in perms01(['a','b','c']):
    print p

Output:

['a', 'b', 'c']
['b', 'a', 'c']
['b', 'c', 'a']
['a', 'c', 'b']
['c', 'a', 'b']
['c', 'b', 'a']

This tip is from here.

Without generators:

def perms02(l):
    sz = len(l)
    if sz         return [l]
    return [p[:i]+[l[0]]+p[i:] for i in xrange(sz) for p in perms02(l[1:])]

for p in perms02(['a','b','c']):
    print p

Output:

['a', 'b', 'c']
['a', 'c', 'b']
['b', 'a', 'c']
['c', 'a', 'b']
['b', 'c', 'a']
['c', 'b', 'a']

This tip is from here.

The two outputs contain the same elements in a different order.

Notes

If S is a finite set of n elements, then there are n! permutations of S. For instance, if we have 4 letters (say a, b, c, and d), then we can arrange them in 4! = 4 * 3 * 2 * 1 = 24 different ways.

Categories: python Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 63 other followers