使用hashlib在Python 3中计算文件的md5摘要

Question

With python 2.7 the following code computes the mD5 hexdigest of the content of a file. 使用python 2.7，以下代码计算文件内容的mD5 hexdigest。

(EDIT: well, not really as answers have shown, I just thought so). （编辑：嗯，不是因为答案已经显示，我只是这么认为）。

import hashlib

def md5sum(filename):
    f = open(filename, mode='rb')
    d = hashlib.md5()
    for buf in f.read(128):
        d.update(buf)
    return d.hexdigest()

Now if I run that code using python3 it raise a TypeError Exception: 现在，如果我使用python3运行该代码，则会引发TypeError异常：

    d.update(buf)
TypeError: object supporting the buffer API required

I figured out that I could make that code run with both python2 and python3 changing it to: 我发现我可以使用python2和python3运行代码，将其更改为：

def md5sum(filename):
    f = open(filename, mode='r')
    d = hashlib.md5()
    for buf in f.read(128):
        d.update(buf.encode())
    return d.hexdigest()

Now I still wonder why the original code stopped working. 现在我仍然想知道为什么原始代码停止工作。 It seems that when opening a file using the binary mode modifier it returns integers instead of strings encoded as bytes (I say that because type(buf) returns int). 看来，当使用二进制模式修饰符打开文件时，它返回整数而不是编码为字节的字符串（我说因为type（buf）返回int）。 Is this behavior explained somewhere ? 这种行为是在某处解释的吗？

Answer 1

I think you wanted the for-loop to make successive calls to f.read(128) . 我想你想让for循环连续调用f.read(128) 。 That can be done using iter() and functools.partial() : 这可以使用iter（）和functools.partial（）来完成：

import hashlib
from functools import partial

def md5sum(filename):
    with open(filename, mode='rb') as f:
        d = hashlib.md5()
        for buf in iter(partial(f.read, 128), b''):
            d.update(buf)
    return d.hexdigest()

print(md5sum('utils.py'))

Answer 2

for buf in f.read(128):
  d.update(buf)

.. updates the hash sequentially with each of the first 128 bytes values of the file. ..使用文件的前128 个字节值中的每一个顺序更新散列。 Since iterating over a bytes produces int objects, you get the following calls which cause the error you encountered in Python3. 由于迭代一个bytes会产生int对象，因此会得到以下调用，这些调用会导致您在Python3中遇到错误。

d.update(97)
d.update(98)
d.update(99)
d.update(100)

which is not what you want. 这不是你想要的。

Instead, you want: 相反，你想要：

def md5sum(filename):
  with open(filename, mode='rb') as f:
    d = hashlib.md5()
    while True:
      buf = f.read(4096) # 128 is smaller than the typical filesystem block
      if not buf:
        break
      d.update(buf)
    return d.hexdigest()

Answer 3

I finally changed my code to the version below (that I find easy to understand) after asking the question. 在提问之后，我终于将我的代码更改为下面的版本（我觉得很容易理解）。 But I will probably change it to the version suggested by Raymond Hetting unsing functools.partial. 但我可能会将其更改为Raymond Hetting unsing functools.partial建议的版本。

import hashlib

def chunks(filename, chunksize):
    f = open(filename, mode='rb')
    buf = "Let's go"
    while len(buf):
        buf = f.read(chunksize)
        yield buf

def md5sum(filename):
    d = hashlib.md5()
    for buf in chunks(filename, 128):
        d.update(buf)
    return d.hexdigest()

使用hashlib在Python 3中计算文件的md5摘要

问题描述

3 个解决方案

解决方案1
31 已采纳 2011-10-19 23:59:49

解决方案2
10 2011-10-19 23:42:45

解决方案3
1 2011-10-20 00:20:32

使用hashlib在Python 3中计算文件的md5摘要

问题描述

3 个解决方案

解决方案1 31 已采纳 2011-10-19 23:59:49

解决方案2 10 2011-10-19 23:42:45

解决方案3 1 2011-10-20 00:20:32

解决方案1
31 已采纳 2011-10-19 23:59:49

解决方案2
10 2011-10-19 23:42:45

解决方案3
1 2011-10-20 00:20:32