Archive

Posts Tagged ‘Levenshtein’

Levenshtein distance

October 19, 2010 Leave a comment

The Levenshtein distance (or edit distance) between two strings is the minimal number of “edit operations” required to change one string into the other. The two strings can have different lengths. There are three kinds of “edit operations”: deletion, insertion, or alteration of a character in either string.

Example: the Levenshtein distance of “ag-tcc” and “cgctca” is 3.

#!/usr/bin/env python

def LD(s,t):
    s = ' ' + s
    t = ' ' + t
    d = {}
    S = len(s)
    T = len(t)
    for i in range(S):
        d[i, 0] = i
    for j in range (T):
        d[0, j] = j
    for j in range(1,T):
        for i in range(1,S):
            if s[i] == t[j]:
                d[i, j] = d[i-1, j-1]
            else:
                d[i, j] = min(d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + 1)
    return d[S-1, T-1]

a = 'ag-tcc'
b = 'cgctca'

print LD(a, b)   # 3

The implementation is from here.

Categories: python Tags: ,