简体   繁体   中英

How can I efficiently compute the md5 sum of an iterable of bits in Python?

Consider the code below. It converts an image to line art and then computes the md5sum of the bits. I don't know a better to do this than with a generator expression producing individual bits. But then how can I feed the result to md5 in an efficient way?

The code below does it with a bitarray object, but I get non-deterministic results handing bitarray instances (which seem to use fancy C stuff under the hood) to md5. So what is the "right" way to do this?

import os, hashlib
from PIL import Image
from bitarray import bitarray

def image_pixel_hash_code(image):
    pixels = list(image.getdata())
    avg = sum(pixels) / len(pixels)
    bits = bitarray(pixel < avg for pixel in pixels)
    return hashlib.md5(bits).hexdigest()


im = Image.open(os.path.expanduser("~/Downloads/test.jpg")).convert("L")
print image_pixel_hash_code(im)

PS I can reproduce the bitarray non-determinism but I assumes it's just a function of using two things together that aren't supposed to work together.

The hash is including random bits at the end of bits if the length of bits is not a multiple of 8.

You can see this by looking at memoryview(bits)

You could fix this by padding bits with 0 s

    bits = bitarray(1 if pixel < avg else 0 for pixel in pixels)
    bits.fill()
    return hashlib.md5(bits).hexdigest()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM