简体   繁体   English

比较defaultdict键值和另一个defaultdict

[英]compare a defaultdict key-value with another defaultdict

I have two defaultdict : 我有两个defaultdict

defaultdict(<type 'list'>, {'a': ['OS', 'sys', 'procs'], 'b': ['OS', 'sys']})

defaultdict(<type 'list'>, {'a': ['OS', 'sys'], 'b': ['OS']})

How do I compare these two to get the count of values missing from each one. 我如何比较这两个,以获得每个缺少的值的计数。 For example I should get two values are missing from second defaultdict for key 'a' and one missing from 'b' . 例如,我应该得到第二个defaultdict键'a'缺少两个值,而'b'丢失一个值。

You should be able to use set differences to find (and count) missing elements most efficiently. 您应该能够使用集合差异来最有效地查找(计数)缺失的元素。 If you're careful, you can even do this without adding items to the defaultdict (and without assuming that the functions inputs are defaultdict ). 如果您小心的话,甚至可以在不向defaultdict添加项目的情况下进行操作(也不必假设函数输入为defaultdict )。

From there, it becomes just a matter of accumulating those results in a dictionary. 从那里开始,将这些结果存储在字典中就成了问题。

def compare_dict_of_list(d1, d2):
    d = {}
    for key, value in d1.items():
        diff_count = len(set(value).difference(d2.get(key, [])))
        d[key] = diff_count
    return d

If you just want the total number missing from the second default dict, you can iterate through the first dict and look at the set difference to figure out how many more things are in A relative to B. 如果只希望第二个默认字典中缺少总数,则可以遍历第一个字典并查看集合差异,以找出相对于B,A中还有多少东西。

If you define the dicts like this: 如果您像这样定义字典:

a = defaultdict(list, {'a': ['OS', 'sys', 'procs'], 'b': ['OS', 'sys']})
b = defaultdict(list, {'a': ['OS', 'sys'], 'b': ['OS']})

This will tell you how many are missing from dict B: 这将告诉您字典B中缺少多少个:

total_missing_inB = 0
for i in a:
    diff = set(a[i]) - set(b[i])
    total_missing_inB += len(diff)

And this will tell you how many are missing from dict A 这将告诉您字典A中缺少多少

total_missing_inA = 0
for i in b:
    diff = set(b[i]) - set(a[i])
    total_missing_inA += len(diff)

Here we present an alternate solution using collections.Counter to track values, and we consider some edge cases concerning uncommon keys and values. 在这里,我们提出了一个使用collections.Counter来跟踪值的替代解决方案,并考虑了一些与不常见的键和值有关的边缘情况。

Code

import collections as ct


def compare_missing(d1, d2, verbose=False):
    """Return the count of missing values from dict 2 compared to dict 1."""
    record = {}
    for k in d1.keys() & d2.keys():
        a, b = ct.Counter(d1[k]), ct.Counter(d2[k])
        record[k] = a - b
    if verbose: print(record)
    return sum(v for c in record.values() for v in c.values())

Demo 演示版

dd0 = ct.defaultdict(list, {"a": ["OS", "sys", "procs"], "b": ["OS", "sys"]})
dd1 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS"]})

compare_missing(dd0, dd1, True)
# {'a': Counter({'procs': 1}), 'b': Counter({'sys': 1})}
# 2

compare_missing(dd1, dd0, True)
# {'a': Counter(), 'b': Counter()}
# 0

Details 细节

If keys are not identical in both dictionaries, compare_missing will only iterate the common keys. 如果两个字典中的键都不相同,则compare_missing将仅迭代公共键。 In the next example, even though a new key ( c ) was added to dd1 , we get the same results as above: 在下一个示例中,即使将新键( c )添加到dd1 ,我们也得到与上述相同的结果:

dd2 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS"], "c": ["OS"]})
compare_missing(dd0, dd2)
# 2

compare_missing(dd2, dd0)
# 0

If uncommon values or replicates are found ( "admin" and "OS" in dd3[b] respectively), these occurrences are counted as well: 如果发现不常见的值或重复项(分别在dd3[b]"admin""OS" ),则也将这些发生计数:

dd3 = ct.defaultdict(list, {"a": ["OS", "sys"], "b": ["OS", "admin", "OS"]})
compare_missing(dd3, dd0, True)
# {'a': Counter(), 'b': Counter({'OS': 1, 'admin': 1})}
# 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM