Python - 從長度不等的列表中獲取替換所有唯一組合

Question

注意：這不是標題可能會說的重復問題

如果我有一個列表清單，我需要從替換中獲取所有組合。

import itertools

l = [[1,2,3] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]
[2, 2, 2]
[2, 2, 3]
[2, 3, 3]
[3, 3, 3]

感謝@RoadRunner和@Idlehands。

上面的代碼是完美的2個問題：

對於大型列表，itertools.product會拋出MemoryError。 當l有18個3長的子列表時，給出約400mil的組合。
訂單問題因此sorted不適用於我的問題。 這可能會讓一些人感到困惑，因此可以通過下面的例子來解釋。
l = [[1,2,3], [1], [1,2,3]]

這里我有2個獨特的組：

Group1：元素0,2具有相同的值[1,2,3]

第2組：元素1，其值為[1]

因此，我需要的解決方案是：

[1,1,1]
[1,1,2]
[1,1,3]
[2,1,2]
[2,1,3]
[3,1,3]

因此，位置1固定為1 。

希望這個例子有幫助。

Answer 1

如何使用collections.defaultdict以不同順序對具有相同元素的序列進行分組，然后從每個鍵中選擇第一個元素：

from itertools import product
from collections import defaultdict

l = [[1] ,[1,2,3],  [1,2,3]]

d = defaultdict(list)
for x in product(*l):
    d[tuple(sorted(x))].append(x)

print([x[0] for x in d.values()])

這使：

[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

或者，這也可以通過保留一組已添加的內容來完成：

from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = tuple(sorted(x))
    if curr not in seen:
        combs.append(x)
        seen.add(curr)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

如果您不想排序，請考慮將frozenset與collections.Counter() ：

from collections import Counter
from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = frozenset(Counter(x).items())

    if curr not in seen:
        seen.add(curr)
        combs.append(x)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

注意：如果您不想使用defaultdict() ，也可以使用setdefault()作為第一種方法。

Answer 2

編輯答案：

基於新信息，為了處理過多的重載itertools.product()的組合，我們可以嘗試小批量提取列表：

from itertools import product
l = [list(range(3))]*18
prods = product(*l)
uniques = set()
results = []
totals = 0

def run_batch(n=1000000):
    for i in range(n):
        try:
            result = next(prods)
        except StopIteration:
            break
        unique = tuple(sorted(result))
        if unique not in uniques:
            uniques.add(unique)
            results.append(result)
    global totals
    totals += i

run_batch()
print('Total iteration this batch: {0}'.format(totals))
print('Number of unique tuples: {0}'.format(len(uniques)))
print('Number of wanted combos: {0}'.format(len(results)))

輸出：

Total iteration this batch: 999999
Number of unique tuples: 103
Number of wanted combos: 103
First 10 results:
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2)

在這里，我們可以通過使用您選擇的范圍調用next(prod)來控制批量大小，並在您認為合適的情況下繼續。 uniques元組是將一組中的元組作為參考點排序， results按照您想要的正確順序排列。 當我使用3 ^ 18的列表運行時，兩個大小應該相同並且非常小。 我不熟悉內存分配，但這樣程序不應該將所有不需要的結果存儲在內存中，因此你應該有更多的擺動空間。 否則，您始終可以選擇將results導出到文件以騰出空間。 顯然，此示例僅顯示列表的長度，但您可以輕松地顯示/保存該列表以用於您自己的目的。

我不能說這是最好的方法或最優化的方法，但它似乎對我有用。 也許它會對你有用嗎？ 該批次花費約10秒鍾運行5次（每批平均約2次）。 整套prods我花了15分鍾跑：

Total iteration: 387420102
Number of unique tuples: 190
Number of wanted combos: 190

原答案：

@RoadRunner有一個使用sort()和defaultdict的簡潔解決方案，但我覺得后者不需要。 我利用他的sort()建議並在這里實現了修改版本。

從這個答案：

l = [[1] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

輸出：

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]

Answer 3

對於短輸入序列，可以通過將itertools.product的輸出過濾為唯一值來完成。 set(tuple(sorted(t)) for t in itertools.product(*l))一個未優化的方法set(tuple(sorted(t)) for t in itertools.product(*l))如果你願意，可以轉換為一個list 。

如果你有足夠的笛卡爾產品扇出，這太低效了，如果你的輸入示例顯示你可以依賴的子列表是排序的，你可以從文檔的permutations討論借用一個注釋並過濾掉非排序值：

permutations（）的代碼也可以表示為product（）的子序列，經過篩選以排除具有重復元素的條目（來自輸入池中相同位置的條目）

因此，您需要快速測試值是否已排序，如下所示： https ： //stackoverflow.com/a/3755410/2337736

然后list(t for t in itertools.product(*l) if is_sorted(t))

除此之外，我認為你必須進入遞歸或固定長度的l 。

Python - 從長度不等的列表中獲取替換所有唯一組合

問題描述

3 個解決方案

解決方案1
5 2018-01-22 06:04:39

解決方案2
4 已采納 2018-01-22 05:57:00

解決方案3
4 2018-01-22 05:58:56

Python - 從長度不等的列表中獲取替換所有唯一組合

問題描述

3 個解決方案

解決方案1 5 2018-01-22 06:04:39

解決方案2 4 已采納 2018-01-22 05:57:00

解決方案3 4 2018-01-22 05:58:56

解決方案1
5 2018-01-22 06:04:39

解決方案2
4 已采納 2018-01-22 05:57:00

解決方案3
4 2018-01-22 05:58:56