简体   繁体   English

MD5哈希的Python + JSON序列化-如何保证两个等效的对象将序列化为完全相同的字符串?

[英]Python + JSON serialization for MD5 hash - how can I guarantee that two equivalent objects will serialize to exactly the same string?

I need to take an md5 hash of the contents of a dict or list and I want to ensure that two equivalent structures will give me the same hash result. 我需要对字典或列表的内容进行md5哈希处理,我想确保两个等效的结构将为我提供相同的哈希结果。

My approach thus far has been to carefully define the order of the structures and to sort the various lists and dictionaries that they contain prior to running them through json.dumps() . 到目前为止,我的方法是仔细定义结构的顺序,并对它们包含的各种列表和字典进行排序,然后再通过json.dumps()运行它们。

As my structures get more complex, however, this is becoming laborious and error prone, and in any case I was never sure it was working 100% of the time or just 98% of the time. 但是,随着我的结构变得越来越复杂,这变得很费力且容易出错,无论如何我都不确定它是否在100%的时间或98%的时间工作。

Just curious if anyone has a quick solution for this? 只是好奇是否有人对此有快速解决方案? Is there an option I can set in the json module to sort objects completely? 我可以在json模块中设置一个选项来对对象进行完全排序吗? Or some other trick I can use to do a complete comparison of the information in two structures and return a hash guaranteed to be unique to it? 还是我可以用来对两种结构中的信息进行完整比较并返回保证唯一的哈希值的其他技巧?

I only need the strings (and then the md5) to come out the same when I serialize the objects -- I'm not concerned about deserializing for this use case. 当我序列化对象时,我只需要字符串(然后是md5)就可以一样了-我不关心此用例的反序列化。

JSON output by default is non-deterministic simply because the results of __hash__ are salted for str (key values for typical JSON objects) to prevent a DoS vector (see the notes in documentation). 默认情况下,JSON输出是不确定的,这仅仅是因为__hash__的结果被盐化为str (典型JSON对象的键值)以防止DoS向量(请参阅文档中的注释)。 For this reason you need to call json.dumps with sort_keys set to True. 因此,您需要将sort_keys设置为True来调用json.dumps

>>> import json
>>> d = {'this': 'This word', 'that': 'That other word', 'other': 'foo'}
>>> json.dumps(d)
'{"this": "This word", "other": "foo", "that": "That other word"}'
>>> json.dumps(d, sort_keys=True)
'{"other": "foo", "that": "That other word", "this": "This word"}'

For objects that end up serialized into a list (ie list , tuple ) you will need to ensure the ordering is done in the expected way because by definition lists are not ordered in any particular way (ordering of the elements in those collections will be persistent in the position they have been placed/modified by the program itself). 对于最终序列化为list (即listtuple ),您将需要确保以预期的方式进行排序,因为根据定义,列表没有以任何特定的方式进行排序(这些集合中元素的排序将是持久的)在程序已放置/修改的位置)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM