简体   繁体   English

Python - 比较字典列表并返回不匹配的键之一

[英]Python - Compare lists of dictionaries and return not matches of one of the keys

I want to compare 2 lists (with dictionaries inside) and get values from the dictionaries that don't match.我想比较 2 个列表(里面有字典)并从不匹配的字典中获取值。

So I have something like this:所以我有这样的事情:

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}] 

list2 = [{'text': 'dog'}] 

And I want to get the texts that are not on both lists.我想得到不在两个列表中的文本。 Texts are the only criteria.文本是唯一的标准。 It's not relevant if the numbers are the same or not.数字是否相同无关紧要。

The desired result would look like this:期望的结果如下所示:

list_notmatch = [{'text': 'cat'},{'text': 'horse'}]

If it's easier or faster, this would be OK too:如果它更容易或更快,这也可以:

list_notmatch = [{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]

I've seen a similar question ( Compare two lists of dictionaries in Python. Return non match ) but the output it's not exactly what I need and I don't know if it's the best solution for what I need.我已经看到了一个类似的问题( 比较 Python 中的两个字典列表。返回不匹配)但输出并不完全是我需要的,我不知道它是否是我需要的最佳解决方案。

The real lists are quite long (there could be more than 10.000 dictionaries inside list1), so I guess I need a performant solution (or at least a not very slow one).真正的列表很长(list1 中可能有超过 10.000 个字典),所以我想我需要一个高性能的解决方案(或者至少一个不是很慢的解决方案)。 Order is not important.顺序并不重要。

Thanks!谢谢!

The first form of output:第一种形式的输出:

Take the 'text' in each dictionary as two sets, and then use the symmetric_difference method or xor operator:将每个字典中的'text'作为两个集合,然后使用symmetric_difference方法或 xor 运算符:

>>> {d['text'] for d in list1} ^ {d['text'] for d in list2}
{'horse', 'cat'}
>>> {d['text'] for d in list1}.symmetric_difference({d['text'] for d in list2})
{'horse', 'cat'}
>>> [{'text': v} for v in _]
[{'text': 'horse'}, {'text': 'cat'}]

The two methods can be targeted to do some optimization.这两种方法可以有针对性地做一些优化。 If operators are used, the set with shorter length can be placed on the left:如果使用运算符,长度较短的集合可以放在左边:

>>> timeit(lambda: {d['text'] for d in list1} ^ {d['text'] for d in list2})
0.59890600000017
>>> timeit(lambda: {d['text'] for d in list2} ^ {d['text'] for d in list1})
0.5732289999996283

If you use the symmetric_difference method, you can use generator expressions or maps to avoid explicitly creating a second set:如果您使用symmetric_difference方法,您可以使用生成器表达式或映射来避免显式创建第二个集合:

>>> timeit(lambda: {d['text'] for d in list1}.symmetric_difference({d['text'] for d in list2}))
0.6045051000000967
>>> timeit(lambda: {d['text'] for d in list1}.symmetric_difference(map(itemgetter('text'), list2)))
0.579385199999706

The second form of output:第二种输出形式:

A simple way to get the dictionary itself in the list is:在列表中获取字典本身的一种简单方法是:

  1. Create a dictionary for each list, where the key is the 'text' of each dictionary and the value is the corresponding dictionary.为每个列表创建一个字典,其中键是每个字典的'text' ,值是对应的字典。
  2. The dict.keys() can use operators like sets (in Python3.10+, for lower versions, you need to manually convert them to sets.), so use twice subtraction to calculate the difference set, and then take the initial dictionary from the two large dictionaries according to the results. dict.keys()可以使用集合之类的操作符(在Python3.10+,低版本需要手动转换成集合),所以使用二次减法计算差集,然后从两个大字典根据结果。
>>> dict1 = {d['text']: d for d in list1}
>>> dict2 = {d['text']: d for d in list2}
>>> dict1_keys = dict1.keys()    # use set(dict1.keys()) if the version of Python is not 3.10+
>>> dict2_keys = dict2.keys()    # ditto
>>> [dict1[k] for k in dict1_keys - dict2_keys] + [dict2[k] for k in dict2_keys - dict1_keys]
[{'text': 'horse', 'number': 40}, {'text': 'cat', 'number': 40}]

Note that using the xor operator to directly obtain the symmetry difference here may not be an ideal method, because you also need to take the results from the large dictionary separately.注意这里使用xor算子直接获取对称差可能不是一个理想的方法,因为还需要单独从大字典中取结果。 If you want to use the xor operator, you can combine the two dictionaries and take values from them:如果要使用 xor 运算符,可以组合两个字典并从中获取值:

>>> list(map((dict1 | dict2).__getitem__, dict1_keys ^ dict2_keys))
[{'text': 'horse', 'number': 40}, {'text': 'cat', 'number': 40}]

I would use set arithmetics following way我会按照以下方式使用集合算术

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}] 
list2 = [{'text': 'dog'}]
texts1 = set(i['text'] for i in list1) 
texts2 = set(i['text'] for i in list2)
texts = texts1.symmetric_difference(texts2)
list_notmatch1 = [{"text":i} for i in texts]
list_notmatch2 = [i for i in list1+list2 if i['text'] in texts]
print(list_notmatch1)
print(list_notmatch2)

output输出

[{'text': 'horse'}, {'text': 'cat'}]
[{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]

Explanation: I create set from texts from each list, then use symmetric_difference which does说明:我从每个列表中的文本创建集合,然后使用 symmetric_difference

Return the symmetric difference of two sets as a new set.将两个集合的对称差作为新集合返回。

(ie all elements that are in exactly one of the sets.) (即恰好在一组中的所有元素。)

then texts might be used to create 1st format or used to filter concatenation of list1 and list2 to get 2nd format.然后文本可能用于创建第一种格式或用于过滤 list1 和 list2 的串联以获得第二种格式。

in O(N+M) you can do this way在 O(N+M) 中你可以这样做

# your code goes here

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]

list2 = [{'text': 'dog'}]

matched = {}
no_match =[]

for i in list2:
        matched[i['text']] = []

for i in list1:
    if i['text'] in matched:
        matched[i['text']].append(i)
    else:
        no_match.append(i)
matched = matched.values()

print(matched, no_match)

output输出

dict_values([[{'text': 'dog', 'number': 10}]]) [{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]

You can try this:你可以试试这个:

list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}] 

list2 = [{'text': 'dog'}]

result = []
for d1 in list1:
    if not any(d2['text'] == d1['text'] for d2 in list2):
        result.append(d1)
print(result)

Output:输出:

[{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM