简体   繁体   English

奇怪的python的hashlib.md5行为,每次都有不同的哈希

[英]Strange python's hashlib.md5 behavior, different hash each time

I've faced some really strange behavior trying to calculate md5 hash of string. 我在尝试计算字符串的md5哈希值时遇到了一些非常奇怪的行为。 Returned hash is always wrong (and different) if I pass string that was result of concatenation. 如果我传递的是串联结果,则返回的哈希值总是错误的(且不同)。 Only way to get real hash I've found is to pass string that wasn't modified in any way after creation. 获得真正的哈希的唯一方法是传递创建后未进行任何修改的字符串。

Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> m = hashlib.md5() 
>>> a1 = "stack"
>>> a2 = "overflow"
>>> a3 = a1 + a2
>>> a4 = str(a1 + a2)
>>> m.update("stackoverflow")
>>> m.hexdigest()
'73868cb1848a216984dca1b6b0ee37bc' //actuall hash
>>> m.update(a1 + a2)
>>> m.hexdigest()
'458b7358b9e0c3f561957b96e543c5a8'
>>> m.update(a3)
>>> m.hexdigest()
'65b0e62d4ff2d91e111ecc8f27f0e8f5'
>>> m.update(a4)
>>> m.hexdigest()
'60c3ae3dd9a2095340b2e024194bad3c'
>>> m.update(a1 + a2)
>>> m.hexdigest()
'acd4e14145d34dcb10af785badf8e73e'
>>> m.update(a1 + a2)
>>> m.hexdigest()
'03c06ca09faa26166f1096db02272b11'
>>> a1 + a2 == a1 + a2
True
>>> a1 + a2 == a3
True
>>> a3 == a4
True

Am I missing something? 我想念什么吗?

What you are missing is that hash.update() doesn't replace the hashed data . 您所缺少的是hash.update() 不会替换散列数据 You are continually updating the hash object, so you are getting the hash of the concatenated strings . 您将不断更新哈希对象,因此您将获得串联字符串的哈希值。 From the hashlib.hash.update() documentation : hashlib.hash.update()文档中

Update the hash object with the string arg . 使用字符串arg更新哈希对象。 Repeated calls are equivalent to a single call with the concatenation of all the arguments: m.update(a) ; 重复调用等效于将所有参数串联在一起的单个调用: m.update(a) ; m.update(b) is equivalent to m.update(a+b) . m.update(b)等同于m.update(a+b)

Bold emphasis mine. 大胆强调我的。

So you are not getting the hash of a single 'stackoverflow' string, you are getting the hash first of 'stackoverflow' , then of 'stackoverflowstackoverflow' , then 'stackoverflowstackoverflowstackoverflow' etc., each time appending another 'stackoverflow' creating a longer and longer string. 因此,您不会获取单个'stackoverflow'字符串的哈希,而是首先获取'stackoverflow' ,然后是'stackoverflowstackoverflow' ,然后是'stackoverflowstackoverflowstackoverflow'等的哈希,每次附加另一个'stackoverflow'创建更长的哈希值,更长的字符串。 None of those longer strings are equal to the original short string so their hashes are not likely to be equal either. 这些较长的字符串都不与原始的较短的字符串相等,因此它们的哈希也不太可能相等。

Create a new object for new strings, instead: 为新字符串创建一个对象,而不是:

>>> import hashlib
>>> m = hashlib.md5()
>>> m.update('stack' + 'overflow')
>>> m.hexdigest()
'73868cb1848a216984dca1b6b0ee37bc'
>>> m = hashlib.md5()   # **new** hash object
>>> m.update('stackoverflow')
>>> m.hexdigest()
'73868cb1848a216984dca1b6b0ee37bc'
>>> m = hashlib.md5()     # new object again
>>> m.update('stack')     # add the string in pieces, part 1
>>> m.update('overflow')  # and part 2
>>> m.hexdigest()
'73868cb1848a216984dca1b6b0ee37bc'

You can readily produce your 'wrong' hashes by sending in concatenated data: 您可以通过发送串联数据来轻松产生“错误”哈希:

>>> m = hashlib.md5()
>>> m.update('stackoverflowstackoverflow')
>>> m.hexdigest()
'458b7358b9e0c3f561957b96e543c5a8'
>>> m = hashlib.md5()
>>> m.update('stackoverflowstackoverflowstackoverflow')
>>> m.hexdigest()
'65b0e62d4ff2d91e111ecc8f27f0e8f5'
>>> m = hashlib.md5()
>>> m.update('stackoverflow' * 4)
>>> m.hexdigest()
'60c3ae3dd9a2095340b2e024194bad3c'

Note that you can also pass in the first string into the md5() function: 请注意,您还可以将第一个字符串传递给md5()函数:

>>> hashlib.md5('stackoverflow').hexdigest()
'73868cb1848a216984dca1b6b0ee37bc'

You normally use the hash.update() method only if you are processing data in chunks (like reading a file line by line or reading blocks of data from a socket), and don't want to have to hold all of that data in memory at once. 通常,仅当您正在分块处理数据时才使用hash.update()方法(例如逐行读取文件或从套接字读取数据块),并且不想将所有数据都保存在其中。一次记忆。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM