简体   繁体   English

在Python中获取几个元组列表的第二个元素的简单有效方法?

[英]Easy & efficient way to get intersection of the 2nd elements for several lists of tuples in Python?

I'm new to Python (working in 2.7), and I am finding SO to be a very valuable resource! 我是Python的新手(在2.7中工作),我发现它是一个非常有价值的资源!

Let's say I'm working with several lists of 2-element tuples generally of the form (ID, value), eg, 假设我正在处理几个通常具有形式(ID,值)的2元素元组的列表,例如,

list1 = [(111, 222), (111, 333), (111, 444)]
list2 = [(555, 333), (555, 444), (555, 777)]
list3 = [(123, 444), (123, 888), (123, 999)]

What I really want to do is find an easy (and computationally efficient) way to get the intersection of the 2nd elements of these tuples. 我真正想要做的是找到一种简单(并且计算效率)的方法来获得这些元组的第二个元素的交集。 I've looked in the Python docs and found that sets might do what I want... and this post has been helpful in helping me understand how to get the intersection of two lists. 我查看了Python文档 ,发现集合可能会做我想要的... 这篇文章有助于我理解如何获得两个列表的交集。

I understand that I could make three whole new "values-only" lists by looping through the tuples like this: 我知道我可以通过像这样循环遍历元组来创建三个全新的“仅值”列表:

newList1 = []
for tuple in list1:
   newList1.append(tuple[1])
newList2 = []
for tuple in list2:
   newList2.append(tuple[1])
newList3 = []
for tuple in list3:
   newList3.append(tuple[1])

and then get the intersection of each pair like this: 然后像这样得到每对的交集:

i_of_1and2 = set(newList1).intersection(newList2)
i_of_1and3 = set(newList2).intersection(newList3)
i_of_2and3 = set(newList1).intersection(newList3)

But my lists are a bit large - like hundreds of thousands (sometimes tens of millions) of tuples. 但是我的列表有点大 - 就像成千上万(有时数千万)的元组一样。 Is this really the best way to go about getting the intersection of the 2nd elements in these three lists tuples? 这真的是获得这三个列表元组中第二个元素交集的最佳方法吗? It seems...inelegant...to me. 对我来说似乎......不优雅......

Thanks for any help! 谢谢你的帮助!

You are showing a large problem to begin with variable1 is generally a bad sign - if you want to have multiple values, use a data structure, not lots of variables with numbered names. 你显示一个大问题,开始使用variable1通常是一个不好的标志 - 如果你想拥有多个值,使用数据结构,而不是许多带编号名称的变量。 This stops you repeating your code over and over, and helps stop bugs. 这会阻止您反复重复代码,并有助于阻止错误。

Let's use a list of lists instead: 让我们使用列表列表:

values = [
    [(111, 222), (111, 333), (111, 444)],
    [(555, 333), (555, 444), (555, 777)],
    [(123, 444), (123, 888), (123, 999)]
]

Now we want to get only the second element of each tuple in the sublists. 现在我们想要只获得子列表中每个元组的第二个元素。 This is easy enough to compute using a list comprehension : 这很容易使用列表理解来计算:

>>> [[item[1] for item in sublist] for sublist in values]
[[222, 333, 444], [333, 444, 777], [444, 888, 999]]

And then, we want the intersections between the items, we use itertools.combinations() to get the various pairs of two possible: 然后,我们想要项目之间的交叉点,我们使用itertools.combinations()来获得两个可能的对:

>>> for values, more_values in itertools.combinations(new_values, 2):
...     set(values).intersection(more_values)
... 
{444, 333}
{444}
{444}

So, if we wrap this together: 所以,如果我们将它们包装在一起:

import itertools

values = [
    [(111, 222), (111, 333), (111, 444)],
    [(555, 333), (555, 444), (555, 777)],
    [(123, 444), (123, 888), (123, 999)]
]

sets_of_first_items = ({item[1] for item in sublist} for sublist in values)
for values, more_values in itertools.combinations(sets_of_first_items, 2):
    print(values.intersection(more_values))

Which gives us: 这给了我们:

{444, 333}
{444}
{444}

The change I made here was to make the inner list a set comprehension, to avoid creating a list just to turn it into a set, and using a generator expression rather than a list comprehension, as it's lazily evaluated. 我在这里做的改变是让内部列表成为一个集合理解,避免创建一个列表只是为了把它变成一个集合,并使用生成器表达式而不是列表理解,因为它被懒惰地评估。

As a final note, if you wanted the indices of the lists we are using to generate the intersection, it's simple to do with the enumerate() builtin : 最后要注意的是,如果你想要我们用来生成交集的列表的索引,那么使用enumerate()内置它很简单:

sets_of_first_items = ({item[1] for item in sublist} for sublist in values)
for (first_number, first_values), (second_number, second_values) in itertools.combinations(enumerate(sets_of_first_items), 2):
    print("Intersection of {0} and {1}: {2}".format(first_number, second_number, first_values.intersection(second_values)))

Which gives us: 这给了我们:

Intersection of 0 and 1: {444, 333}
Intersection of 0 and 2: {444}
Intersection of 1 and 2: {444}

Edit: 编辑:

As noted by tonyl7126 , this is also an issue that could be greatly helped by using a better data structure. 正如tonyl7126所述 ,这也是一个可以通过使用更好的数据结构大大帮助的问题。 The best option here is to use a dict of user id to a set of product ids. 这里最好的选择是将用户ID的dict用于一组产品ID。 There is no reason to store your data as a list when you only need a set, and are going to convert it to a set later, and the dict is a much better solution for the type of data you are trying to store. 当您只需要一个集合时,没有理由将您的数据存储为列表,并且稍后将其转换为集合,并且dict对于您尝试存储的数据类型来说是一个更好的解决方案。

See the following example: 请参阅以下示例:

import itertools

values = {
    "111": {222, 333, 444},
    "555": {333, 444, 777},
    "123": {444, 888, 999}
}

for (first_user, first_values), (second_user, second_values) in itertools.combinations(values.items(), 2):
    print("Intersection of {0} and {1}: {2}".format(first_user, second_user, first_values.intersection(second_values)))

Giving us: 给我们:

Intersection of 555 and 123: {444}
Intersection of 555 and 111: {444, 333}
Intersection of 123 and 111: {444}

I'm not sure if you've read about dictionaries in python yet, but that seems like it might fit what you are trying to do better in combination with lists. 我不确定你是否已经阅读过python中的词典,但这似乎与你想要在列表中组合做得更好。 Dictionaries are made up of keys and values, just like what you seem to be emulating with your 2 element tuples. 字典由键和值组成,就像您使用2元素元组模拟的那样。

So for example, list1, list2, and list3 could be represented as a dictionary that would look like this (assuming 111 is the id): your_dict = {"111": [222, 333, 444], "555": [333, 444, 777], "123":[444, 888, 999]} 例如,list1,list2和list3可以表示为看起来像这样的字典(假设111是id):your_dict = {“111”:[222,333,444],“555”:[333 ,444,777],“123”:[444,888,999]}

So, if you wanted to get all of the values for a specific id, like "111", you would write: your_dict.get("111") and that would return the list. 所以,如果你想获得特定id的所有值,比如“111”,你会写:your_dict.get(“111”)并返回列表。 Here is a link to some documentation on dictionaries as well. 这里是一些关于字典的文档的链接。 http://docs.python.org/library/stdtypes.html#typesmapping http://docs.python.org/library/stdtypes.html#typesmapping

You could take advantage of the fact that the set.intersection(...) method takes 2 or more sets and finds their intersection. 您可以利用set.intersection(...)方法采用2个或更多集合并找到它们的交集的事实。 Also, you can use list comprehensions to reduce the code bloat. 此外,您可以使用列表推导来减少代码膨胀。 And lastly, you can use argument list unpacking to make it a one-liner. 最后,您可以使用参数列表解包来使其成为一个单行。 For example: 例如:

>>> list1 = [(111, 222), (111, 333), (111, 444)]
>>> list2 = [(555, 333), (555, 444), (555, 777)]
>>> list3 = [(123, 444), (123, 888), (123, 999)]
>>>
>>> set.intersection(*[set(t[1] for t in l) for l in (list1, list2, list3)])
set([444])

To help you understand what's going on, the call to set.intersection(...) is equivalent to this python code: 为了帮助您了解正在发生的事情,对set.intersection(...)的调用等效于此python代码:

>>> allsets = []
>>> for l in (list1, list2, list3):
...   n = set()
...   for t in l:
...     n.add(t[1])
...   allsets.append(n)
... 
>>> allsets
[set([444, 333, 222]), set([777, 444, 333]), set([888, 444, 999])]
>>> allsets[0].intersection(allsets[1]).intersection(allsets[2])
set([444])

Here is a easy way of doing it. 这是一种简单的方法。

>>> list1 = [(111, 222), (111, 333), (111, 444)]
>>> list2 = [(555, 333), (555, 444), (555, 777)]
>>> list3 = [(123, 444), (123, 888), (123, 999)]
>>> lists = [list1, list2, list3]
>>> set.intersection(*(set(zip(*list)[1]) for list in lists))
set([444])
  1. The zip * trick is used to unzip the tuples and get the sets of 2nd elements. zip *技巧用于解压缩元组并获取第二个元素集。
  2. set.intersection * is used to intersect them all together. set.intersection *用于将它们全部交叉在一起。

With regards to efficiency, I would try the easy way first and see if that is fast enough before trying to optimize. 关于效率,我会首先尝试简单的方法,看看在尝试优化之前是否足够快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Python 中的元组中获取第二个元素? - How to get 2nd elements from tuples in Python? Python:获得两个元组交集的更有效方法? - Python: More efficient way to get the intersection of two tuples? 计算两个元组列表之间匹配的第二个元素的数量的更快方法? -Python - Faster way to count number of matching 2nd elements between two list of tuples? - Python 如何仅从 Python 的元组中添加第二个元素? - How to add only the 2nd elements from tuples in Python? Python:从多个列表中获取随机切片的更有效方法 - Python: More efficient way to get random slices from several lists Python 从列表列表中获取元素的交集 - Python get intersection of elements from list of lists 在python中删除多个列表中的几个项目的最有效方法? - Most efficient way to remove several items in several lists in python? 在python中同时找到元组列表中最大的第一项和第二项的最快方法 - Fastest way to find the greatest 1st and 2nd items of a list of tuples simultaneously in python 使元组列表中的元素在 python 中唯一的有效方法是什么? - What is an efficient way to make elements in a list of tuples unique in python? 将内部 0 和 1 索引附加到第二索引二维列表列表中的所有元素 - python - Append inner 0 & 1st index to all elements in 2nd index two-dimensional List of Lists - python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM