简体   繁体   English

Python-使用元组比较字典列表-意外行为?

[英]Python - comparing lists of dictionaries using tuples - unexpected behaviour?

I've been attempting to compare two lists of dictionaries, and to find the userid's of new people in list2 that aren't in list1. 我一直在尝试比较两个字典列表,并在list2中查找不在list1中的新用户的用户ID。 For example the first list: 例如第一个列表:

list1 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}]

and the second list: 第二个列表:

list2 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}, {"userid": "34892", "name": "daniel", "age": "64", "occupation": "chef"}]

the desired output: 所需的输出:

newpeople = ['34892']

This is what I've managed to put together: 这是我设法整理的:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

newpeople = [t for t in list2tuple if t not in list1tuple]

This actually seems to be pretty efficient, especially considering the lists I am using might contain over 50,000 dictionaries. 实际上,这似乎非常有效,特别是考虑到我使用的列表可能包含超过50,000个字典。 However, here's the issue: 但是,这是问题所在:

If it finds a userid in list2 that indeed isn't in list1, it adds it to newpeople (as desired), but then also adds every other userid that comes afterwards in list2 to newpeople as well . 如果它在list2中找到确实不在list1中的用户ID,则将其添加到newpeople(根据需要), 然后还将list2之后出现的所有其他userid也添加到newpeople中

So, say list2 contains 600 userids and the 500th userid in list2 isn't found anywhere in list1, the first item in newpeople will be the 500th userid (again, as desired), but then followed by the other 100 userids that came after the new one. 因此,假设list2包含600个用户ID,而list2中第500个用户ID在list1的任何地方都找不到,则newpeople中的第一项将是第500个用户ID(再次根据需要),但之后是其他100个用户ID。新的一个。

This is pretty perplexing to me - I'd greatly appreciate anyone helping me get to the bottom of why this is happening. 这对我来说很困惑-非常感谢有人帮助我深入了解为什么会发生这种情况。

Currently you have set list1tuple and list2tuple as: 当前,您已将list1tuplelist2tuple设置为:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

These are generators , not lists (or tuples), which means they can only be iterated over once, which is causing your problem. 这些是生成器 ,而不是列表(或元组),这意味着它们只能被迭代一次,这会引起您的问题。

You could change them to be lists: 您可以将它们更改为列表:

list1tuple = [d["userid"] for d in list1]
list2tuple = [d["userid"] for d in list2]

which would allow you to iterate over them as many times as you like. 这样您就可以根据需要遍历它们多次。 But a better solution would be to simply make them sets: 但是更好的解决方案是简单地设置它们:

list1tuple = set(d["userid"] for d in list1)
list2tuple = set(d["userid"] for d in list2)

And then take the set difference 然后取设定差

newpeople = list2tuple - list1tuple

As can be seen from a python console, list1tuple and list2tuple are generators: 从python控制台可以看出,list1tuple和list2tuple是生成器:

>>> ((d["userid"]) for d in list1)
<generator object <genexpr> at 0x10a9936e0>

Although the second one can remain a generator (there is no need to expand the list), the first one should first be converted to a list, set or tuple, eg: 尽管第二个可以保留为生成器(无需扩展列表),但第一个应首先转换为列表,集合或元组,例如:

list1set = {d['userid'] for d in list1}
list2generator = (d['userid'] for d in list2)

You can now check for membership in the group: 现在,您可以检查组中的成员资格:

>>> [t for t in list2generator if t not in list1set]
['34892']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM