如何在python中快速獲取集合的所有交集

Question

我想在python中計算有限整數集合（這里實現為列表列表）的所有（不同）交集（為了避免混淆，正式定義在問題的最后）：

> A = [[0,1,2,3],[0,1,4],[1,2,4],[2,3,4],[0,3,4]]
> all_intersections(A) # desired output
[[], [0], [1], [2], [3], [4], [0, 1], [0, 3], [0, 4], [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [0, 1, 4], [0, 3, 4], [1, 2, 4], [2, 3, 4], [0, 1, 2, 3]]

我有一個迭代執行它的算法，但它相當慢（我應該發布嗎？），一個測試用例

[[0, 1, 2, 3, 4, 9], [0, 1, 4, 5, 6, 10], [0, 2, 4, 5, 7, 11], [1, 3, 4, 6, 8, 12], [2, 3, 4, 7, 8, 13], [4, 5, 6, 7, 8, 14], [0, 1, 9, 10, 15, 16], [0, 2, 9, 11, 15, 17], [1, 3, 9, 12, 16, 18], [2, 3, 9, 13, 17, 18], [9, 15, 16, 17, 18, 19], [0, 5, 10, 11, 15, 20], [1, 6, 10, 12, 16, 21], [10, 15, 16, 19, 20, 21], [5, 6, 10, 14, 20, 21], [11, 15, 17, 19, 20, 22], [5, 7, 11, 14, 20, 22], [2, 7, 11, 13, 17, 22], [7, 8, 13, 14, 22, 23], [3, 8, 12, 13, 18, 23], [13, 17, 18, 19, 22, 23], [14, 19, 20, 21, 22, 23], [6, 8, 12, 14, 21, 23], [12, 16, 18, 19, 21, 23]]

這需要我大約2.5秒來計算。

任何想法如何快速做到？

正式定義（實際上沒有乳膠模式很難）：讓A = {A1，...，An}是非負整數的有限集合Ai的有限集合。 然后輸出應該是集合{A的B：B子集中的集合的交集}。

因此，正式算法將采用A的所有子集的所有交叉點的並集。但這顯然是永遠的。

非常感謝！

Answer 1

這是一個遞歸解決方案。 在您的測試示例中幾乎是即時的：

def allIntersections(frozenSets):
    if len(frozenSets) == 0:
        return []
    else:
        head = frozenSets[0]
        tail = frozenSets[1:]
        tailIntersections = allIntersections(tail)
        newIntersections = [head]
        newIntersections.extend(tailIntersections)
        newIntersections.extend(head & s for s in tailIntersections)
        return list(set(newIntersections))

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

在編輯這里是一個更清晰，非遞歸的相同想法的實現。

如果將空集合的集合定義為通用集合，則問題最容易，並且可以通過獲取所有元素的並集來獲得足夠的通用集合。 這是格理論中的標准運動，並且將空集合的集合作為空集合是雙重的。 如果你不想要它，你總是可以拋棄這個通用集：

def allIntersections(frozenSets):
    universalSet = frozenset.union(*frozenSets)
    intersections = set([universalSet])
    for s in frozenSets:
        moreIntersections = set(s & t for t in intersections)
        intersections.update(moreIntersections)
    return intersections

def all_intersections(lists):
    sets = allIntersections([frozenset(s) for s in lists])
    return [list(s) for s in sets]

您的測試示例如此之快的原因在於，即使您的集合有24集，因此有2 ** 24（1680萬）個潛在交叉點，實際上只有242個（如果不計算則為241個）空的交叉點）不同的交叉點。 因此，每次通過循環的交叉點的數量最多為數百。

可以選擇24組，以便所有2 ** 24個可能的交叉點實際上是不同的，因此很容易看出最壞情況的行為是指數的。 但是，如果在測試示例中，交叉點的數量很少，則此方法將允許您快速計算它們。

潛在的優化可能是在循環之前對集合的大小進行排序。 前面處理較小的設置可能導致更早出現的空交叉點，從而使不同交叉點的總數保持較小，直到循環結束。

Answer 2

在我的機器上為您的大型測試輸入花費大約3.5毫秒的迭代解決方案：

from itertools import starmap, product
from operator import and_

def all_intersections(sets):
    # Convert to set of frozensets for uniquification/type correctness
    last = new = sets = set(map(frozenset, sets))
    # Keep going until further intersections add nothing to results
    while new:
        # Compute intersection of old values with newly found values
        new = set(starmap(and_, product(last, new)))
        last = sets.copy()  # Save off prior state
        new -= last         # Determine truly newly added values
        sets |= new         # Accumulate newly added values in complete set
    # No more intersections being generated, convert results to canonical
    # form, list of lists, where each sublist is displayed in order, and
    # the top level list is ordered first by size of sublist, then by contents
    return sorted(map(sorted, sets), key=lambda x: (len(x), x))

基本上，它只是在舊結果集和新發現的交叉點之間繼續進行雙向交叉，直到一輪交叉點沒有改變任何東西，然后就完成了。

注意：這實際上不是最好的解決方案（遞歸在算法上足以更好地贏得測試數據，其中John Coleman的解決方案，在將排序添加到外部包裝器以使其匹配格式之后，大約需要0.94 ms，而對於我的3.5 ms ）。 我主要提供它作為以其他方式解決問題的一個例子。

如何在python中快速獲取集合的所有交集

問題描述

2 個解決方案

解決方案1
5 2016-06-03 20:31:23

解決方案2
2 2016-06-03 21:22:27

如何在python中快速獲取集合的所有交集

問題描述

2 個解決方案

解決方案1 5 2016-06-03 20:31:23

解決方案2 2 2016-06-03 21:22:27

解決方案1
5 2016-06-03 20:31:23

解決方案2
2 2016-06-03 21:22:27