简体   繁体   English

将 a 项的所有组合列出到 b 个大小为 c 的组中

[英]List all combinations of a items into b groups of size c

I'm looking for a way in Python to split an arbitrary number of items into an arbitrary number of even groups, and to obtain the list/array of all these splits.我正在寻找一种在 Python 中将任意数量的项目拆分为任意数量的偶​​数组的方法,并获取所有这些拆分的列表/数组。

So for example, given 12 items, there are 5775 ways of grouping them into 3 groups of 4. Calculating this is not an issue, but I can't seem to find a way to return a list or array of these 5775. I can get the first groups using:因此,例如,给定 12 个项目,有 5775 种方法可以将它们分成 3 组,每组 4 个。计算这不是问题,但我似乎找不到返回这些 5775 的列表或数组的方法。我可以使用以下方法获取第一组:

import itertools
list(itertools.combinations(range(12), 4))

But how can I obtain the remaining groups from this?但是我怎样才能从中获得剩余的组呢?

The desired output for a = 4 , b = 2 , c = 2 would be: a = 4b = 2c = 2的所需输出将是:

[[[1, 2], [3, 4]],
 [[1, 3], [2, 4]],
 [[1, 4], [2, 3]]]

And for a = 3 , b = 3 , c = 1 :对于a = 3b = 3c = 1

[[[1], [2], [3]]]

You can use a recursive generator that, at each level or recursion, computes all combinations of the (remaining) items.您可以使用递归生成器,它在每个级别或递归中计算(剩余)项目的所有组合。 The next level of recursion receives only those items that have been not been used in the current level already (the remainder ).下一个递归级别仅接收那些尚未在当前级别中使用的项目(其余)。

In order to prevent duplicates in terms of the group ordering we need to truncate the output of it.combinations such that it doesn't yield a combination that has appeared in the remainder of a previous iteration before.为了防止在组排序方面出现重复,我们需要截断it.combinations的输出,这样它就不会产生之前迭代的剩余部分中出现的组合。 Let n be the number of items and g the size of each group.n为项目数, g为每组的大小。 Then the first item from it.combinations is (0, 1, ..., g-1) (in terms of the indices).然后it.combinations中的第一项是(0, 1, ..., g-1) (就索引而言)。 The item (1, 2, ..., g) will be part of the remainder when the current item from it.combinations is (0, g, g+1, ..., 2*g-1) (this assumes n % g == 0 ).it.combinations中的当前项目为(0, g, g+1, ..., 2*g-1)时,项目(1, 2, ..., g)将成为余数的一部分(假设n % g == 0 )。 Hence, we need to truncate the output of it.combinations such that the first element is fixed ( 0 in the above example).因此,我们需要截断it.combinations的输出,以使第一个元素是固定的(在上面的示例中为0 )。 Because it.combinations produces the items in lexicographical order, this covers the first (n-1)! / (ng)! / (g-1)!因为it.combinations按字典顺序生成项目,所以这涵盖了第一个(n-1)! / (ng)! / (g-1)! (n-1)! / (ng)! / (g-1)! items ( ! denotes the factorial).项目( !表示阶乘)。

The following is an example implementation:下面是一个示例实现:

import itertools as it
from math import factorial
from typing import Iterator, Sequence, Tuple, TypeVar


T = TypeVar('T')


def group_items(items: Sequence[T], group_size: int) -> Iterator[Tuple[Tuple[T, ...], ...]]:
    if len(items) % group_size != 0:
        raise ValueError(
            f'Number of items is not a multiple of the group size '
            f'({len(items)} and {group_size})'
        )
    elif len(items) == group_size:
        yield (tuple(items),)
    elif items:
        count, _r = divmod(
            factorial(len(items) - 1),
            factorial(len(items) - group_size) * factorial(group_size - 1)
        )
        assert _r == 0
        for group in it.islice(it.combinations(items, group_size), count):
            remainder = [x for x in items if x not in group]  # maintain order
            yield from (
                (group, *others)
                for others in group_items(remainder, group_size)
            )


result = list(group_items(range(12), 4))
print(len(result))

from pprint import pprint
pprint(result[:3])
pprint(result[-3:])

Note that the above example uses remainder = [x for x in items if x not in group] to compute what items should go to the next level of recursion.请注意,上面的示例使用remainder = [x for x in items if x not in group]来计算哪些项目应该进入下一个递归级别。 This might be inefficient if your group size is large.如果您的组规模很大,这可能效率低下。 Instead you could also use a set (if your items are hashable).相反,您也可以使用一个set (如果您的项目是可散列的)。 Also, if equality comparison ( == ) between your items is expensive, it would be better to work with indices rather then with the items and compute the group and remainder from these indices.此外,如果您的项目之间的相等比较 ( == ) 成本很高,则最好使用索引而不是项目,并根据这些索引计算groupremainder I didn't include these aspects in the above code snippet in order to keep it simple, but if you are interested in the details, I can expand my answer.为了简单起见,我没有在上面的代码片段中包含这些方面,但是如果您对细节感兴趣,我可以扩展我的答案。

Not sure if there's a smarter or more concise way, but you can create a recursive function to pick combinations for the first list, then pick combinations from the items not yet used.不确定是否有更智能或更简洁的方法,但您可以创建一个递归函数来为第一个列表选择combinations ,然后从尚未使用的项目中选择组合。 Also, if order of both the items in the sublists and the sublists themselves does not seem to matter, that means that the first sublist will always starts with the smallest element (otherwise it would not be the first sublist), the second starts with the smallest of the remaining items, etc. This should cut down on the number of combinations and prevent any duplicate results from appearing.此外,如果子列表中的项目和子列表本身的顺序似乎并不重要,这意味着第一个子列表将始终以最小元素开头(否则它不会是第一个子列表),第二个以最小的剩余项目等。这应该减少组合的数量并防止出现任何重复的结果。

from itertools import combinations

def split(items, b, c):
    assert len(items) == b * c
    def _inner(remaining, groups):
        if len(groups) == b:
            yield groups
        else:
            first, *rest = (x for x in remaining if not groups or x not in groups[-1])
            for comb in combinations(rest, c-1):
                yield from _inner(rest, groups + [{first, *comb}])
    return _inner(items, [])

for x in split(list(range(6)), 2, 3):
    print(x)

Sample Output (using lists of sets, but you may convert the sublists to list before yielding):示例输出(使用集合列表,但您可以在生成之前将子列表转换为列表):

[{0, 1, 2}, {3, 4, 5}]
[{0, 1, 3}, {2, 4, 5}]
[{0, 1, 4}, {2, 3, 5}]
[{0, 1, 5}, {2, 3, 4}]
[{0, 2, 3}, {1, 4, 5}]
[{0, 2, 4}, {1, 3, 5}]
[{0, 2, 5}, {1, 3, 4}]
[{0, 3, 4}, {1, 2, 5}]
[{0, 3, 5}, {1, 2, 4}]
[{0, 4, 5}, {1, 2, 3}]

For (a,b,c) = (12, 3, 4) it yields 5775 elements, as expected.对于 (a,b,c) = (12, 3, 4),它产生 5775 个元素,正如预期的那样。 For longer lists, this will still take a lot of time, though.但是,对于更长的列表,这仍然需要很多时间。

Use the set_partitions() function in more-itertools package:使用more-itertools包中的set_partitions()函数:

# pip install more-itertools
from more_itertools import set_partitions
a, b, c = 12, 3, 4

results = []
for part in set_partitions(range(a), b):
    if all([len(p) == c for p in part]):
        results.append(part)

print(len(results))  # 5775

Parts of 5775 results:部分 5775 结果:

...
[[2, 4, 5, 9], [0, 1, 7, 10], [3, 6, 8, 11]]
[[1, 4, 5, 9], [0, 2, 7, 10], [3, 6, 8, 11]]
[[0, 4, 5, 9], [1, 2, 7, 10], [3, 6, 8, 11]]
[[2, 3, 5, 9], [1, 4, 7, 10], [0, 6, 8, 11]]
[[2, 3, 5, 9], [0, 4, 7, 10], [1, 6, 8, 11]]
...

In case you wanna know what does it do, basically set_partitions(range(4), 2) yields the set partitions of [0, 1, 2, 3] into 2 parts:如果您想知道它的作用,基本上set_partitions(range(4), 2)将 [0, 1, 2, 3] 的集合分区分为两部分:

[[0], [1, 2, 3]], 
[[0, 1], [2, 3]], 
[[1], [0, 2, 3]], 
[[0, 1, 2], [3]], 
[[1, 2], [0, 3]], 
[[0, 2], [1, 3]], 
[[2], [0, 1, 3]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM