[英]Using hashlib to compute md5 digest of a file in Python 3
With python 2.7 the following code computes the mD5 hexdigest of the content of a file. 使用python 2.7,以下代码计算文件内容的mD5 hexdigest。
(EDIT: well, not really as answers have shown, I just thought so). (编辑:嗯,不是因为答案已经显示,我只是这么认为)。
import hashlib
def md5sum(filename):
f = open(filename, mode='rb')
d = hashlib.md5()
for buf in f.read(128):
d.update(buf)
return d.hexdigest()
Now if I run that code using python3 it raise a TypeError Exception: 现在,如果我使用python3运行该代码,则会引发TypeError异常:
d.update(buf)
TypeError: object supporting the buffer API required
I figured out that I could make that code run with both python2 and python3 changing it to: 我发现我可以使用python2和python3运行代码,将其更改为:
def md5sum(filename):
f = open(filename, mode='r')
d = hashlib.md5()
for buf in f.read(128):
d.update(buf.encode())
return d.hexdigest()
Now I still wonder why the original code stopped working. 现在我仍然想知道为什么原始代码停止工作。 It seems that when opening a file using the binary mode modifier it returns integers instead of strings encoded as bytes (I say that because type(buf) returns int). 看来,当使用二进制模式修饰符打开文件时,它返回整数而不是编码为字节的字符串(我说因为type(buf)返回int)。 Is this behavior explained somewhere ? 这种行为是在某处解释的吗?
I think you wanted the for-loop to make successive calls to f.read(128)
. 我想你想让for循环连续调用f.read(128)
。 That can be done using iter() and functools.partial() : 这可以使用iter()和functools.partial()来完成:
import hashlib
from functools import partial
def md5sum(filename):
with open(filename, mode='rb') as f:
d = hashlib.md5()
for buf in iter(partial(f.read, 128), b''):
d.update(buf)
return d.hexdigest()
print(md5sum('utils.py'))
for buf in f.read(128):
d.update(buf)
.. updates the hash sequentially with each of the first 128 bytes values of the file. ..使用文件的前128 个字节值中的每一个顺序更新散列。 Since iterating over a bytes
produces int
objects, you get the following calls which cause the error you encountered in Python3. 由于迭代一个bytes
会产生int
对象,因此会得到以下调用,这些调用会导致您在Python3中遇到错误。
d.update(97)
d.update(98)
d.update(99)
d.update(100)
which is not what you want. 这不是你想要的。
Instead, you want: 相反,你想要:
def md5sum(filename):
with open(filename, mode='rb') as f:
d = hashlib.md5()
while True:
buf = f.read(4096) # 128 is smaller than the typical filesystem block
if not buf:
break
d.update(buf)
return d.hexdigest()
I finally changed my code to the version below (that I find easy to understand) after asking the question. 在提问之后,我终于将我的代码更改为下面的版本(我觉得很容易理解)。 But I will probably change it to the version suggested by Raymond Hetting unsing functools.partial. 但我可能会将其更改为Raymond Hetting unsing functools.partial建议的版本。
import hashlib
def chunks(filename, chunksize):
f = open(filename, mode='rb')
buf = "Let's go"
while len(buf):
buf = f.read(chunksize)
yield buf
def md5sum(filename):
d = hashlib.md5()
for buf in chunks(filename, 128):
d.update(buf)
return d.hexdigest()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.