简体   繁体   English

如何在python中有效地找到两个字典之间的所有差异

[英]How to find all differences between two dictionaries efficiently in python

So, I have 2 dictionaries, I have to check for missing keys and for matching keys, check if they have same or different values. 因此,我有2个字典,我必须检查缺少的键和匹配的键,检查它们是否具有相同或不同的值。

dict1 = {..}
dict2 = {..}
#key values in a list that are missing in each
missing_in_dict1_but_in_dict2 = []
missing_in_dict2_but_in_dict1 = []
#key values in a list that are mismatched between the 2 dictionaries
mismatch = []

What's the most efficient way to do this? 最有效的方法是什么?

You can use dictionary view objects , which act as sets . 您可以使用充当集合的 字典视图对象 Subtract sets to get the difference: 减去集可得出差值:

missing_in_dict1_but_in_dict2 = dict2.keys() - dict1
missing_in_dict2_but_in_dict1 = dict1.keys() - dict2

For the keys that are the same, use the intersection, with the & operator: 对于相同的键,请使用带有&运算符的交集:

mismatch = {key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]}

If you are still using Python 2, use dict.viewkeys() . 如果您仍在使用Python 2,请使用dict.viewkeys()

Using dictionary views to produce intersections and differences is very efficient, the view objects themselves are very lightweight the algorithms to create the new sets from the set operations can make direct use of the O(1) lookup behaviour of the underlying dictionaries. 使用字典视图产生交集和差异是非常有效的,视图对象本身非常轻巧,从集合操作创建新集合的算法可以直接利用基础字典的O(1)查找行为。

Demo: 演示:

>>> dict1 = {'foo': 42, 'bar': 81}
>>> dict2 = {'bar': 117, 'spam': 'ham'}
>>> dict2.keys() - dict1
{'spam'}
>>> dict1.keys() - dict2
{'foo'}
>>> [key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]]
{'bar'}

and a performance comparison with creating separate set() objects: 以及创建单独的set()对象的性能比较:

>>> import timeit
>>> import random
>>> def difference_views(d1, d2):
...     missing1 = d2.keys() - d1
...     missing2 = d1.keys() - d2
...     mismatch = {k for k in d1.keys() & d2 if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> def difference_sets(d1, d2):
...     missing1 = set(d2) - set(d1)
...     missing2 = set(d1) - set(d2)
...     mismatch = {k for k in set(d1) & set(d2) if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> testd1 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> testd2 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_views as d', number=1000)
1.8643521590274759
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_sets as d', number=1000)
2.811345119960606

Using set() objects is slower, especially when your input dictionaries get larger. 使用set()对象比较慢,尤其是当您的输入字典变大时。

One easy way is to create sets from the dict keys and subtract them: 一种简单的方法是从dict键创建集合并减去它们:

>>> dict1 = { 'a': 1, 'b': 1 }
>>> dict2 = { 'b': 1, 'c': 1 }
>>> missing_in_dict1_but_in_dict2 = set(dict2) - set(dict1)
>>> missing_in_dict1_but_in_dict2
set(['c'])
>>> missing_in_dict2_but_in_dict1 = set(dict1) - set(dict2)
>>> missing_in_dict2_but_in_dict1
set(['a'])

Or you can avoid casting the second dict to a set by using .difference() : 或者,您可以避免使用.difference()将第二个dict转换为set

>>> set(dict1).difference(dict2)
set(['a'])
>>> set(dict2).difference(dict1)
set(['c'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM