Archive

Posts Tagged ‘similarity’

Fuzzy string matching

November 2, 2011 Leave a comment

Hamming distance

October 19, 2010 Leave a comment

The Hamming distance is defined between two strings of equal length. It measures the number of positions with mismatching characters.

Example: the Hamming distance between “toned” and “roses” is 3.

#!/usr/bin/env python

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print hamming_distance(a, b)   # 3

If you need the number of matching character positions:

#!/usr/bin/env python

def similarity(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 == ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print similarity(a, b)    # 2

Actually this is equal to len(s1) - hamming_distance(s1, s2). Remember, len(s1) == len(s2).

More info on zip() here.

Categories: python Tags: , , ,
Follow

Get every new post delivered to your Inbox.

Join 75 other followers