Archive

Archive for October, 2010

Fluffy is gone

October 29, 2010 Leave a comment

We are sad to inform you that Fluffy, the world’s longest snake living in captivity, has died. 18-years-old and weighing 300-pounds Fluffy held the title of longest snake by Guinness World Records and was a hit attraction at Columbus Zoo.

Find more info here.

Categories: python Tags: ,

Levenshtein distance

October 19, 2010 Leave a comment

The Levenshtein distance (or edit distance) between two strings is the minimal number of “edit operations” required to change one string into the other. The two strings can have different lengths. There are three kinds of “edit operations”: deletion, insertion, or alteration of a character in either string.

Example: the Levenshtein distance of “ag-tcc” and “cgctca” is 3.

#!/usr/bin/env python

def LD(s,t):
    s = ' ' + s
    t = ' ' + t
    d = {}
    S = len(s)
    T = len(t)
    for i in range(S):
        d[i, 0] = i
    for j in range (T):
        d[0, j] = j
    for j in range(1,T):
        for i in range(1,S):
            if s[i] == t[j]:
                d[i, j] = d[i-1, j-1]
            else:
                d[i, j] = min(d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + 1)
    return d[S-1, T-1]

a = 'ag-tcc'
b = 'cgctca'

print LD(a, b)   # 3

The implementation is from here.

Categories: python Tags: ,

Hamming distance

October 19, 2010 Leave a comment

The Hamming distance is defined between two strings of equal length. It measures the number of positions with mismatching characters.

Example: the Hamming distance between “toned” and “roses” is 3.

#!/usr/bin/env python

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print hamming_distance(a, b)   # 3

If you need the number of matching character positions:

#!/usr/bin/env python

def similarity(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 == ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print similarity(a, b)    # 2

Actually this is equal to len(s1) - hamming_distance(s1, s2). Remember, len(s1) == len(s2).

More info on zip() here.

Categories: python Tags: , , ,

Permutations of a list

October 19, 2010 Leave a comment

Update (20120321): The methods presented here can generate all the permutations. However, the permutations are not ordered lexicographically. If you need the permutations in lexicographical order, refer to this post.

Problem

You need all the permutations of a list.

Solution

With generators:

#!/usr/bin/env python

def perms01(li):
    if len(li)         yield li
    else:
        for perm in perms01(li[1:]):
            for i in range(len(perm)+1):
                yield perm[:i] + li[0:1] + perm[i:]

for p in perms01(['a','b','c']):
    print p

Output:

['a', 'b', 'c']
['b', 'a', 'c']
['b', 'c', 'a']
['a', 'c', 'b']
['c', 'a', 'b']
['c', 'b', 'a']

This tip is from here.

Without generators:

def perms02(l):
    sz = len(l)
    if sz         return [l]
    return [p[:i]+[l[0]]+p[i:] for i in xrange(sz) for p in perms02(l[1:])]

for p in perms02(['a','b','c']):
    print p

Output:

['a', 'b', 'c']
['a', 'c', 'b']
['b', 'a', 'c']
['c', 'a', 'b']
['b', 'c', 'a']
['c', 'b', 'a']

This tip is from here.

The two outputs contain the same elements in a different order.

Notes

If S is a finite set of n elements, then there are n! permutations of S. For instance, if we have 4 letters (say a, b, c, and d), then we can arrange them in 4! = 4 * 3 * 2 * 1 = 24 different ways.

Categories: python Tags: ,

Generators

October 19, 2010 Leave a comment

Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all the data values and which statement was last executed).”

Let’s rewrite our Fibonacci function using generators. In the previous approach, we specified how many Fibonacci numbers we want to get. The function calculated all of them and returned a list containing all the elements. With generators, we can calculate the numbers one by one. The new function will calculate a number, return it, and suspend its execution. When we call it again, it will resume where it left off and it runs until it computes another number, etc.

First let’s see a Fibonacci function that calculates the numbers in an infinite loop:

#!/usr/bin/env python

def fib():
    a, b = 0, 1
    while True:
        print a    # the current number is here
        a, b = b, a+b

fib()

In order to rewrite it in the form of a generator, we need to locate the part where the current value is calculated. This is the line with print a. We only need to replace this with yield a. It means that the function will return this value and suspend its execution until called again.

So, with generators it will look like this:

#!/usr/bin/env python

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b

f = fib()
for i in range(10):    # print the first ten Fibonacci numbers
    print f.next(),    # 0 1 1 2 3 5 8 13 21 34

It is also possible to get a slice from the values of a generator. For instance, we want the 5th, 6th, and 7th Fibonacci numbers:

#!/usr/bin/env python

from itertools import islice

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a+b

for i in islice(fib(), 5, 8):
    print i    # 5 8 13

More info on islice is here. For this post I used tips from here.

Update (20110406)

Here is a presentation in PDF entitled “Generator Tricks For Systems Programmers” by David Beazley (presented at PyCon 2008). (Reddit thread is here.)

PyCon 2010, EuroPython 2010

October 18, 2010 Leave a comment

PyCon is the largest annual gathering for the community using and developing the open-source Python programming language. Several videos are available too.

EuroPython is the European Python conference. It is aimed at everyone in the Python community, of all skill levels, both users and programmers. A lucky blogger was there, read his impressions here.

Update (20110509)

PyCon Video Archive

“This is a complete list of all recorded PyCon talks since 2009 with direct links to the video download. The official archive can be found at pycon.blip.tv.”

Categories: python

The News Television Project (HírTV)

October 18, 2010 Leave a comment

In this post I describe how to watch news on a Hungarian site. Although the video that we want to play is in Hungarian, you might get some ideas that you can use in a different project.

Project description

Currently I live abroad and sometimes I want to watch news in my mother tongue. So, the Hungarian News Television (HírTV) collects its news programs at http://www.hirtv.hu/view/videoview/hirado . Here, a video has the following URL: http://www.hirtv.net/filmek/hirado21/hiradoYYYYMMDD.wmv , where YYYYMMDD is the date (for instance http://www.hirtv.net/filmek/hirado21/hirado20101018.wmv). Instead of starting a web browser, visiting this page and clicking on a link, I want to launch the news video with a Python script.

Difficulty

When the script is executed, it may be possible that the news of the current day is not yet uploaded. So we need to verify if the URL exists. However, if we want to get a WMV file that doesn’t exist, the web server of HirTv will return an HTML page instead of indicating that the given URL is missing. So we will have to verify the Content-Type of the URLs. If it’s text/html => error, if it’s video/x-ms-wmv => OK.

Solution

#!/usr/bin/env python

import datetime
import urllib
import os

WMV  = 'video/x-ms-wmv'

base = 'http://www.hirtv.net/filmek/hirado21/hirado'
ext = '.wmv'

def get_content_type(url):
    d = urllib.urlopen(url)
    return d.info()['Content-Type']

def date_to_str(d):
    return "%d%02d%02d" % d

def prettify(d):
    return "%d-%02d-%02d" % d

def play_video(video_url):
    print "> " + video_url
    command = 'mplayer %s 1>/dev/null 2>&1' % video_url
    #command = 'vlc %s 1>/dev/null 2>&1' % video_url    # if you prefer VLC
    os.system(command)

today = datetime.date.today().timetuple()[:3]
video_today = base + date_to_str(today) + ext
if get_content_type(video_today) == WMV:
    play_video(video_today)
else:
    yesterday = (datetime.date.today() - datetime.timedelta(days = 1)).timetuple()[:3]
    video_yesterday = base + date_to_str(yesterday) + ext

    print "The video for today (%s) is not available." % prettify(today)
    val = raw_input( "Do you want to watch the video of yesterday (%s) [y/n]? " % prettify(yesterday) )
    if val == "y":
        if get_content_type(video_yesterday) == WMV:
            play_video(video_yesterday)
        else:
            print "Sorry. The video of yesterday (%s) is not available either." % prettify(yesterday)

First we determine the today’s date and using this information we create a URL for the video file. If it really exists (i.e. the Content-Type is correct), then we play it calling mplayer. If the Content-Type is incorrect, then the video of today was not yet uploaded. In this case we offer the user to play the video of yesterday.

Update (20101107): A bug in date_to_str() and prettify() was corrected. Months and days must be padded with 0s, i.e. 6 must become 06 for instance. VLC support is also added, it’s put in comment.

Categories: python Tags: , , , , ,