简体   繁体   English

Python - 从长度不等的列表中获取替换所有唯一组合

[英]Python - Get all unique combinations with replacement from lists of list with unequal length

Note : This is not a duplicate question as the title might say 注意:这不是标题可能会说的重复问题

If I have a list of list , I need to get all combinations from it with replacement. 如果我有一个列表清单,我需要从替换中获取所有组合。

import itertools

l = [[1,2,3] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]
[2, 2, 2]
[2, 2, 3]
[2, 3, 3]
[3, 3, 3]

Thanks to @RoadRunner and @Idlehands. 感谢@RoadRunner和@Idlehands。

Above code is perfect with 2 problems : 上面的代码是完美的2个问题:

  1. For large list, itertools.product throws MemoryError. 对于大型列表,itertools.product会抛出MemoryError。 When l has 18 3-length sublists to give ~400mil combn. 当l有18个3长的子列表时,给出约400mil的组合。

  2. Order matters and thus sorted would not work for my problem. 订单问题因此sorted不适用于我的问题。 This could be confusing for some and hence explaining with below example. 这可能会让一些人感到困惑,因此可以通过下面的例子来解释。

    l = [[1,2,3], [1], [1,2,3]]

Here I have 2 unique groups : 这里我有2个独特的组:

Group1 : elements 0, 2 which has same value [1,2,3] Group1:元素0,2具有相同的值[1,2,3]

Group 2 : element 1 which has value [1] 第2组:元素1,其值为[1]

Thus, the solutions I need is : 因此,我需要的解决方案是:

[1,1,1]
[1,1,2]
[1,1,3]
[2,1,2]
[2,1,3]
[3,1,3]

Thus location 1 was fixed to 1 . 因此,位置1固定为1

Hope this example helps. 希望这个例子有帮助。

What about grouping sequences with the same elements in different order with a collections.defaultdict , then picking the first element from each key: 如何使用collections.defaultdict以不同顺序对具有相同元素的序列进行分组,然后从每个键中选择第一个元素:

from itertools import product
from collections import defaultdict

l = [[1] ,[1,2,3],  [1,2,3]]

d = defaultdict(list)
for x in product(*l):
    d[tuple(sorted(x))].append(x)

print([x[0] for x in d.values()])

Which gives: 这使:

[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

Alternatively, this can also be done with keeping a set of what has been added: 或者,这也可以通过保留一组已添加的内容来完成:

from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = tuple(sorted(x))
    if curr not in seen:
        combs.append(x)
        seen.add(curr)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

If you don't want to sort, consider using a frozenset with collections.Counter() : 如果您不想排序,请考虑将frozensetcollections.Counter()

from collections import Counter
from itertools import product

l = [[1] ,[1,2,3],  [1,2,3]]

seen = set()
combs = []

for x in product(*l):
    curr = frozenset(Counter(x).items())

    if curr not in seen:
        seen.add(curr)
        combs.append(x)

print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]

Note: You can also use setdefault() for the first approach, if you don't want to use a defaultdict() . 注意:如果您不想使用defaultdict() ,也可以使用setdefault()作为第一种方法。

Edited Answer: 编辑答案:

Based on the new information, in order to handle a plethora of combination overloading the itertools.product() , we can try to pull the list in small batches: 基于新信息,为了处理过多的重载itertools.product()的组合,我们可以尝试小批量提取列表:

from itertools import product
l = [list(range(3))]*18
prods = product(*l)
uniques = set()
results = []
totals = 0

def run_batch(n=1000000):
    for i in range(n):
        try:
            result = next(prods)
        except StopIteration:
            break
        unique = tuple(sorted(result))
        if unique not in uniques:
            uniques.add(unique)
            results.append(result)
    global totals
    totals += i

run_batch()
print('Total iteration this batch: {0}'.format(totals))
print('Number of unique tuples: {0}'.format(len(uniques)))
print('Number of wanted combos: {0}'.format(len(results)))

Output: 输出:

Total iteration this batch: 999999
Number of unique tuples: 103
Number of wanted combos: 103
First 10 results:
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2)

Here we can control the batch size by calling next(prod) with the range of your choice, and continue as you see fit. 在这里,我们可以通过使用您选择的范围调用next(prod)来控制批量大小,并在您认为合适的情况下继续。 The uniques are sorted tuples in a set as a reference point, and the results are in the proper order you wanted. uniques元组是将一组中的元组作为参考点排序, results按照您想要的正确顺序排列。 Both size should be the same and are surprisingly small when I ran with the list of 3^18. 当我使用3 ^ 18的列表运行时,两个大小应该相同并且非常小。 I'm not well acquainted with memory allocation but this way the program shouldn't store all the unwanted results in memory, so you should therefore have more wiggle room. 我不熟悉内存分配,但这样程序不应该将所有不需要的结果存储在内存中,因此你应该有更多的摆动空间。 Otherwise, you can always opt to export the results to a file to make room. 否则,您始终可以选择将results导出到文件以腾出空间。 Obviously this sample only show the length of the list, but you can easily display/save that for your own purpose. 显然,此示例仅显示列表的长度,但您可以轻松地显示/保存该列表以用于您自己的目的。

I can't argue this is the best approach or most optimized, but It seems to work for me. 我不能说这是最好的方法或最优化的方法,但它似乎对我有用。 Maybe it'll work for you? 也许它会对你有用吗? This batch took approximately ~10s to run 5 times (avg ~2s each batch). 该批次花费约10秒钟运行5次(每批平均约2次)。 The entire set of prods took me 15 minutes to run: 整套prods我花了15分钟跑:

Total iteration: 387420102
Number of unique tuples: 190
Number of wanted combos: 190

Original Answer: 原答案:

@RoadRunner had a neat solution with sort() and defaultdict , but I feel the latter was not needed. @RoadRunner有一个使用sort()defaultdict的简洁解决方案 ,但我觉得后者不需要。 I leveraged his sort() suggestion and implemented a modified version here. 我利用他的sort()建议并在这里实现了修改版本。

From this answer : 这个答案

l = [[1] ,[1,2,3],  [1,2,3]]
n = []
for i in itertools.product(*l):
    if sorted(i) not in n:
        n.append(sorted(i))
for i in n:
    print(i)

Output: 输出:

[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]

For short input sequences, this can be done by filtering the output of itertools.product to just the unique values. 对于短输入序列,可以通过将itertools.product的输出过滤为唯一值来完成。 One not optimized way is set(tuple(sorted(t)) for t in itertools.product(*l)) , converting to a list if you like. set(tuple(sorted(t)) for t in itertools.product(*l))一个未优化的方法set(tuple(sorted(t)) for t in itertools.product(*l))如果你愿意,可以转换为一个list

If you have enough of a Cartesian product fanout that this is too inefficient, and if your input example showing the sublists as sorted is something you can rely on, you could borrow a note from the docs' discussion of permutations and filter out non-sorted values: 如果你有足够的笛卡尔产品扇出,这太低效了,如果你的输入示例显示你可以依赖的子列表是排序的,你可以从文档的permutations讨论借用一个注释并过滤掉非排序值:

The code for permutations() can be also expressed as a subsequence of product(), filtered to exclude entries with repeated elements (those from the same position in the input pool) permutations()的代码也可以表示为product()的子序列,经过筛选以排除具有重复元素的条目(来自输入池中相同位置的条目)

So you'd want a quick test for whether a value is sorted or not, something like this answer: https://stackoverflow.com/a/3755410/2337736 因此,您需要快速测试值是否已排序,如下所示: https//stackoverflow.com/a/3755410/2337736

And then list(t for t in itertools.product(*l) if is_sorted(t)) 然后list(t for t in itertools.product(*l) if is_sorted(t))

Beyond that, I think you'd have to get into recursion or a fixed length of l . 除此之外,我认为你必须进入递归或固定长度的l

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM