简体   繁体   English

在Python中合并列表/集合的集合

[英]Union sets of lists/sets in Python

I'm writing a small function for finding all the subsets of a list of numbers S, the output is a list of lists. 我正在编写一个用于查找数字S列表的所有子集的小函数,输出是列表的列表。

def subsets(S):
    if S is None or len(S) == 0:
        return [[]]

    output_list = []
    sub = [[[], [S[0]]]]
    for i in xrange(1, len(S)):
        without_ith = sub[i - 1]
        with_ith = [element + [S[i]] for element in without_ith]

        # convert to set of tuples, for set union
        without_set = set(tuple(element) for element in without_ith)
        with_set = set(tuple(element) for element in with_ith)
        new_set = without_set | with_set

        # convert back to list of lists
        new = list(list(element) for element in new_set)
        sub.append(new)

    # sort each sublist into non-descending order
    # output_list = [sorted(element) for element in sub[-1]]
    for element in sub[-1]:
        output_list.append(sorted(element))
    return output_list

The algorithm is described in the accepted answer of this post: Finding all the subsets of a set 该算法在帖子的可接受答案中有所描述: 查找集合的所有子集

The thing that annoys me is the conversion from list of lists to set of tuples, and then perform the union of two sets of tuples, and convert back to list of lists. 令我烦恼的是从列表列表到元组集的转换,然后执行两组元组的并集,然后再转换回列表列表。 All these happens in every iteration. 所有这些都在每次迭代中发生。 The reason is that in Python, sets must contain immutable objects, which are hashable, in order to perform set operations with other sets. 原因是在Python中,集合必须包含不可哈希对象,这些对象是可哈希的,以便与其他集合执行集合操作。 But lists and sets are mutable and unhashable, tuples or frozensets are required as element of such sets. 但是列表和集合是可变且不可散列的,因此需要元组或Frozensets作为此类集合的元素。 For my code, I mutate the element lists first and convert them to tuples for the union, and then convert back to lists. 对于我的代码,我首先对元素列表进行了变异,然后将它们转换为联合的元组,然后再转换回列表。 I wonder if there is a work-around? 我想知道是否有解决方法吗? It looks not very clean and efficient. 它看起来不是很干净和高效。

(And a minor doubt is the list comprehension I commented out # output_list = [sorted(element) for element in sub[-1]] . I'm using PyCharm and it suggests replacing the list comprehension with a for loop. Any reason? I thought list comprehension is always better.) (还有一个小小的疑问是列表# output_list = [sorted(element) for element in sub[-1]]我注释掉了# output_list = [sorted(element) for element in sub[-1]] 。我使用的是PyCharm,它建议将列表解析替换为for循环。是什么原因?我认为列表理解总是更好。

I like a "counting" approach to such tasks as "returning all subsets". 我喜欢“计数”方法来执行诸如“返回所有子集”之类的任务。 Assuming S is a list of numbers without duplicates: 假设S是一个没有重复的数字列表:

def subsets(S):   # S is a list of `whatever`
    result = []
    S = sorted(S)  # iff S can't be assumed to be sorted to start
    # S = sorted(set(S)) if duplicates are possible and must be pruned
    for i in range(2**len(S)):
        asubset = []
        for j, x in enumerate(S):
            if i & 1<<j: asubset.append(x)
        result.append(asubset)
    return result

Essentially, this exploits the 1-1 correspondence between subsets of N things and the binary forms of integers from 0 to 2**N - 1 . 本质上,这利用了N个事物的子集与从0到2**N - 1的整数的二进制形式之间的1-1对应关系。

There are no duplicated items between your without_ith and with_ith lists, since the lists in the former never contain S[i] and those in the latter always do. 有你之间没有重复项目的without_ithwith_ith列表,因为前者的名单不会包含S[i]和那些在后者总是这样。 This means that there is no need to use set objects when you combine them, just concatenate one list onto the other and you'll be good! 这意味着当您将它们组合在一起时,无需使用set对象,只需将一个列表连接到另一个列表上,便会很好! Or, you could use a single list variable and extend it with a list comprehension: 或者,您可以使用单个列表变量,并通过列表理解extend它:

def subsets(S):
    results = [[]]
    for x in S:
        results.extend([item + [x] for item in results])
    return results

If your input list is sorted, all of the subsets will be too. 如果您的输入列表已排序,则所有子集也将都是。 If the input is not always going to be in order and you need the output to be, loop on sorted(S) instead of S directly. 如果输入不一定总是顺序正确的,而您需要输出是正确的,则在sorted(S)而不是S上循环。 The items in the subsets will always be appear in the same order they are iterated over. 子集中的项目将始终以其迭代的顺序出现。

Note that it is important that you use a list comprehension in the extend call, rather than a generator expression. 请注意,在extend调用中使用列表推导,而不是生成器表达式,这一点很重要。 A generator would continue iterating over the newly added items, resulting in an infinite loop (until your system runs out of memory to expand the list). 生成器将继续迭代新添加的项,从而导致无限循环(直到系统用尽内存来扩展列表)。

Looks like list comprehension is faster than appending items because using it does not need to load the list append function into memory. 看起来列表理解比附加项目要快,因为使用列表不需要将列表附加功能加载到内存中。 Check this great article on a deep list comprehension vs append comparison. 在深层列表理解与附加比较中查看这篇出色的文章

So, for your particular problem, I guess list comprehension is faster. 因此,对于您的特定问题,我想列表理解会更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM