python中多个集合的并集

Question

[[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]

I have a list of lists.我有一个列表列表。 My aim is to check whether any one sublist has anything in common with other sublists(excluding the first index object to compare).我的目标是检查任何一个子列表是否与其他子列表有任何共同点（不包括要比较的第一个索引对象）。 If it has anything in common then unify those sublists.如果它有任何共同点，那么统一这些子列表。

For example, for this example my final answer should be something like:例如，对于这个例子，我的最终答案应该是这样的：

[[1, '34', '44', '40' '30', '41', '42', '43']]

I can understand that I should convert the sublists to sets and then use union() and intersection() operations.我可以理解我应该将子列表转换为集合，然后使用 union() 和 intersection() 操作。 But what I am stuck with is how to compare each set/sublist.但我坚持的是如何比较每个集合/子列表。 I can't run a loop over the list and compare each sublist one by one as the contents of the list would be modified and this would lead to an error.我无法在列表上运行循环并一一比较每个子列表，因为列表的内容将被修改，这会导致错误。

What I want to know is there any efficient method to compare all the sublists(converted to sets) and get a union of them?我想知道是否有任何有效的方法来比较所有子列表（转换为集合）并获得它们的联合？

Answer 1

The itertools module makes short work of this problem: itertools模块可以解决这个问题：

>>> from itertools import chain
>>> list(set(chain.from_iterable(d)))
[1, '41', '42', '43', '40', '34', '30', '44']

Another way to do it is to unpack the list into separate arguments for union() :另一种方法是将列表解压缩为union()的单独参数：

>>> list(set().union(*d))
[1, '41', '42', '43', '40', '34', '30', '44']

The latter way eliminates all duplicates and doesn't require that the inputs first be converted to sets.后一种方式消除了所有重复项，并且不需要首先将输入转换为集合。 Also, it doesn't require an import.此外，它不需要导入。

Answer 2

Using the unpacking operator * :使用解包运算符* ：

>> list(set().union(*a))
[1, '44', '30', '42', '43', '40', '41', '34']

(Thanks Raymond Hettinger and ShadowRanger for the comments!) （感谢 Raymond Hettinger 和 ShadowRanger 的评论！）

(Note that （注意

set.union(*tup)

will unpack to将解压到

set.union(tup[0], tup[1], ... tup[n - 1])

) )

Answer 3

In [20]: s
Out[20]: 
[[1, '34', '44'],
 [1, '40', '30', '41'],
 [1, '41', '40', '42'],
 [1, '42', '41', '43'],
 [1, '43', '42', '44'],
 [1, '44', '34', '43']]
In [31]: list({x for _list in s for x in _list})
Out[31]: [1, '44', '30', '42', '43', '40', '41', '34']

Update:更新：

Thanks for the comments感谢您的评论

Answer 4

You can use itertools to perform this action.您可以使用 itertools 来执行此操作。 Let us assume that your list has a variable name A让我们假设您的列表有一个变量名 A

import itertools

single_list_with_all_values = list(itertools.chain(*A))
single_list_with_all_values.sort()

print set(single_list_with_all_values)

Answer 5

>>> big = [[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
>>> set(reduce ( lambda l,a : l + a, big))
set([1, '44', '30', '42', '43', '40', '41', '34'])

And if you really want a list of a list as a final result如果你真的想要一个列表作为最终结果

>>>>[list(set(reduce ( lambda l,a : l + a, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

And if you don't like recoding a lambda function for the list addition :如果您不喜欢为列表添加重新编码 lambda 函数：

>>>>[list(set(reduce ( list.__add__, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

EDIT : after your recommendation about using itertools.chain instead of list.__add__ I ran a timeit for both with the original variable used by the original poster.编辑：在您建议使用 itertools.chain 而不是 list.__add__ 之后，我使用原始海报使用的原始变量为两者运行了 timeit。

It seems that timeit times list.__add__ around 2.8s and itertools.chain around 3.5 seconds.似乎 timeit times list.__add__ 大约 2.8 秒和 itertools.chain 大约 3.5 秒。

I checked on this page and yes, you were right with the itertools.chain contains a from_iterable method that grants a huge performance boost.我在这个页面上检查过，是的，你是对的，itertools.chain 包含一个 from_iterable 方法，它可以极大地提升性能。 see below with list.__add__, itertools.chain and itertools.chain.from_iterable.请参阅下面的 list.__add__、itertools.chain 和 itertools.chain.from_iterable。

>>> timeit.timeit("[list(set(reduce ( list.__add__, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
16.051744650801993
>>> timeit.timeit("[list(set(reduce ( itertools.chain, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
54.721315866467194
>>> timeit.timeit("list(set(itertools.chain.from_iterable(big)))", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
0.040056066849501804

Thank you very much for your advises :)非常感谢您的建议:)

Answer 6

Tested with python 2 only: I personally like the readability of reduce , paired with a simple conditional function, something like仅用 python 2 测试：我个人喜欢reduce的可读性，搭配一个简单的条件函数，比如

# PYTHON 2 ONLY!
somelists = [[1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']] # your original lists
somesets = map(set,somelists) #your lists as sets

def condition(s1,s2): # condition to apply recursively to the sets
    if s1.intersection(s2):
        return s1.union(s2)
reduce( condition,somesets)
#{1, '30', '34', '40', '41', '42', '43', '44'}

Of course you can cast this result to a 2d list if you desire list([reduce(...当然，如果您需要list([reduce(...

I will note that this is something like 3x slower than the chain.fromiterable answer.我会注意到这比chain.fromiterable答案慢 3 倍。

Answer 7

from functools import reduce

out = list(reduce(set.union, iterable))

as long as at least the first the element of iterable is a set.只要至少第一个iterable的元素是一个集合。 Otherwise,否则，

out = list(reduce(set.union, iterable[1:], set(iterable[0])))

python中多个集合的并集

问题描述

7 个解决方案

解决方案1
68 2015-06-11 07:40:05

解决方案2
41 2015-06-11 07:12:11

解决方案3
1 2015-06-11 07:13:01

解决方案4
1 2015-06-11 07:14:43

解决方案5
1 2015-06-11 07:19:38

解决方案6
0 2015-07-30 22:33:40

解决方案7
0 2020-04-28 23:28:16

python中多个集合的并集

问题描述

7 个解决方案

解决方案1 68 2015-06-11 07:40:05

解决方案2 41 2015-06-11 07:12:11

解决方案3 1 2015-06-11 07:13:01

解决方案4 1 2015-06-11 07:14:43

解决方案5 1 2015-06-11 07:19:38

解决方案6 0 2015-07-30 22:33:40

解决方案7 0 2020-04-28 23:28:16

解决方案1
68 2015-06-11 07:40:05

解决方案2
41 2015-06-11 07:12:11

解决方案3
1 2015-06-11 07:13:01

解决方案4
1 2015-06-11 07:14:43

解决方案5
1 2015-06-11 07:19:38

解决方案6
0 2015-07-30 22:33:40

解决方案7
0 2020-04-28 23:28:16