简体   繁体   中英

Inaccurate hamming distance of two strings in binary

I'm trying to calculate the hamming distance between two strings in binary. However I'm not getting the expected output which is 37. Instead I get 33. Can someone explain to me the mistake I'm making?

Here's my code:

def to_bin(s):
return ''.join(format(ord(x), 'b') for x in s)
s1 = to_bin('this is a test')
s2 = to_bin('wokka wokka!!!')

def hamming_distance_bin(x,y):
    z = []
    for i,j in zip(x,y):
        z.append(ord(i)^ord(j))
    return z.count(1)



print hamming_distance_bin(s1,s2)

The reason I'm using the list above, is so that I could print my xor'd output to manually count the 1's and see where I'm going wrong. But I don't seem to understand!

def to_bin(s):
    return ''.join(format(ord(x), 'b') for x in s)

returns a variable-length binary string. You want to output a constant-length one:

def to_bin(s):
    return ''.join(format(ord(x), '08b') for x in s)

FWIW, I'd do:

s1 = bytearray(b'this is a test')
s2 = bytearray(b'wokka wokka!!!')

def hamming_distance_bin(x,y):
    return sum(bin(i^j).count("1") for i,j in zip(x,y))

hamming_distance_bin(s1,s2)

because bytearray is neater than calling ord all the time.

You could also use bin().zfill(8) for each character, which fixes the problem you are having.

The answer above is good code; another way just to get the point across would be:

def hamming(str1, str2):

    score = 0

    for (a, b) in izip(str1, str2):
        a_bits = bin(ord(a))[2:].zfill(8)
        b_bits = bin(ord(b))[2:].zfill(8)

        score += sum(bx != by for bx, by in izip(a_bits, b_bits))
    return score

The benefit of this, I suppose, would be that the string to binary conversion is built into the function itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM