简体   繁体   English

N色元素的有效组合,颜色数量受到限制

[英]Efficient combinations of N colored elements with restriction in the number of colors

Given a set of N elements colored with C colors, how can I find every possible combination of length L that contains no more than a maximum of M colors? 给定一组用C颜色着色的N个元素,如何找到包含不超过最多M种颜色的长度L的每种可能组合?

I tried this algorithm that uses itertools.combinations to generate all the possible combinations, and then filter out those that do not hold the maximum colors condiction. 我尝试使用itertools.combinations生成所有可能组合的算法,然后筛选出那些不具有最大颜色condiction的组合。

from itertools import combinations as cb

def allowed_combinations(elements, combination_size=4, max_colors=3):

    colors = set([c for k, c in elements.items()])
    combinations = cb(elements, combination_size)
    for combination in combinations:
        colors = set([elements[element] for element in combination])
        if len(colors) > max_colors:
            continue
        yield combination


elements = dict()
elements['A'] = 'red'
elements['B'] = 'red'
elements['C'] = 'blue'
elements['D'] = 'blue'
elements['E'] = 'green'
elements['F'] = 'green'
elements['G'] = 'green'
elements['H'] = 'yellow'
elements['I'] = 'white'
elements['J'] = 'white'
elements['K'] = 'black'

combinations = allowed_combinations(elements)

for c in combinations:
    for element in c:
        print("%s-%s" % (element, elements[element]))
    print "\n"

the output is like: 输出如下:

A-red
C-blue
B-red
E-green


A-red
C-blue
B-red
D-blue


A-red
C-blue
B-red
G-green


A-red
C-blue
B-red
F-green

...

The problem is that generating all possible combinations can be computationally very expensive. 问题是产生所有可能的组合在计算上可能非常昂贵。 In my case, for instance, L is often 6 and the number of elements N is around 50, so it gives us Bin(50,6) = 15890700 possible combinations. 例如,在我的情况下,L通常为6,元素N的数量约为50,因此它给出了Bin(50,6)= 15890700可能的组合。 If maximum number of colors allowed in a comination is small, most of combinations are "useless" and so they are discarded in the filter step. 如果组合中允许的最大颜色数量很小,则大多数组合是“无用的”,因此它们在过滤步骤中被丢弃。 My intuition is that I should put the filtering step inside/before the combinatory step, to avoid the explotion of combinations, but I don't see how. 我的直觉是我应该在组合步骤内/之前进行过滤步骤,以避免组合的发布,但我不知道如何。

Here's an implementation that's a bit simpler than the other answers posted so far. 这是一个比到目前为止发布的其他答案更简单的实现。 The basic approach is to: 基本方法是:

  1. Pick a value ("colour" in your terminology) that has not been picked so far; 选择到目前为止尚未挑选的值(术语中的“颜色”);
  2. Loop over i , the number of keys ("elements") associated with that value that will be included in the output; 循环遍历i ,与将包含在输出中的该值相关联的键(“元素”)的数量;
  3. Loop over c , the combinations of those keys of length i ; 循环遍历c ,长度为i的那些键的组合;
  4. Recurse to pick the next value. 递归以选择下一个值。
from collections import defaultdict, deque
from itertools import combinations

def constrained_combinations(elements, r, s):
    """Generate distinct combinations of 'r' keys from the dictionary
    'elements' using at most 's' different values. The values must be
    hashable.

        >>> from collections import OrderedDict
        >>> elements = OrderedDict(enumerate('aabbc'))
        >>> cc = constrained_combinations
        >>> list(cc(elements, 2, 1))
        [(0, 1), (2, 3)]
        >>> list(cc(elements, 3, 2))
        [(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (1, 2, 3), (2, 3, 4)]
        >>> list(cc(elements, 3, 3)) == list(combinations(range(5), 3))
        True
        >>> sum(1 for _ in cc(OrderedDict(enumerate('aabbcccdeef')), 4, 3))
        188

    """
    # 'value_keys' is a map from value to a list of keys associated
    # with that value; 'values' is a list of values in reverse order of
    # first appearance.
    value_keys = defaultdict(list)
    values = deque()
    for k, v in elements.items():
        if v not in value_keys:
            values.appendleft(v)
        value_keys[v].append(k)

    def helper(current, r, s):
        if r == 0:
            yield current
            return
        if s == 0 or not values:
            return
        value = values.pop()
        keys = value_keys[value]
        for i in range(min(r, len(keys)), -1, -1):
            for c in combinations(keys, i):
                for result in helper(current + c, r - i, s - min(i, 1)):
                    yield result
        values.append(value)

    return helper((), r, s)

Notes 笔记

  1. In Python 3.3 or later, you could use the yield from statement to simplify the recursive call: 在Python 3.3或更高版本中,您可以使用yield from语句来简化递归调用:

     yield from helper(current + c, r - i, s - min(i, 1)) 
  2. If you're wondering why the doctests use collections.OrderedDict , it's so that the combinations can be returned in a predictable order, which is necessary for the tests to work. 如果您想知道为什么doctests使用collections.OrderedDict ,那么组合可以以可预测的顺序返回,这是测试工作所必需的。

  3. The code reverses the list values , and iterates downwards over i so that if the caller passes in an OrderedDict , the combinations are returned in a sensible order (with values that appear early in the input also appearing early in the output). 代码反转列表values ,并在i向下迭代,以便如果调用者传入OrderedDict ,则以合理的顺序返回组合(输入中早期出现的值也出现在输出的早期)。

  4. Given the slight awkwardness in getting predictable output from this function, it would, I think, be worth considering changing the interface so that instead of taking a dictionary mapping keys to values, it would take an iterable of (key, value) pairs. 考虑到从该函数获得可预测输出的轻微尴尬,我认为值得考虑更改接口,以便不需要将键映射到值的字典,而是需要可迭代的(键,值)对。

Performance 性能

This is broadly similar in speed to Tim Peter's combs2 : 这与Tim Peter的combs2速度大致相似:

>>> from timeit import timeit
>>> elements = dict(enumerate('abcde' * 10))
>>> test = lambda f:timeit(lambda:sum(1 for _ in f(elements, 6, 3)), number=1)
>>> test(combs2)
11.403807007009163
>>> test(constrained_combinations)
11.38378801709041

Combinatorial problems are notorious for being easy to state but possibly difficult to solve. 组合问题因易于陈述而可能难以解决而臭名昭着。 For this one, I wouldn't use itertools at all, but instead write a custom generator. 对于这个,我根本不会使用itertools ,而是编写一个自定义生成器。 For example, 例如,

def combs(elt2color, combination_size=4, max_colors=3):

    def inner(needed, index):
        if needed == 0:
            yield result
            return
        if n - index < needed:
            # not enough elements remain to reach
            # combination_size
            return
        # first all results that don't contain elts[index]
        for _ in inner(needed, index + 1):
            yield result
        # and then all results that do contain elts[index]
        needed -= 1
        elt = elts[index]
        color = elt2color[elt]
        color_added = color not in colors_seen
        colors_seen.add(color)
        if len(colors_seen) <= max_colors:
            result[needed] = elt
            for _ in inner(needed, index + 1):
                yield result
        if color_added:
            colors_seen.remove(color)

    elts = tuple(elt2color)
    n = len(elts)
    colors_seen = set()
    result = [None] * combination_size
    for _ in inner(combination_size, 0):
        yield tuple(result)

Then: 然后:

elt2color = dict([('A', 'red'), ('B', 'red'), ('C', 'blue'),
                  ('D', 'blue'), ('E', 'green'), ('F', 'green'),
                  ('G', 'green'), ('H', 'yellow'), ('I', 'white'),
                  ('J', 'white'), ('K', 'black')])
for c in combs(elt2color):
    for element in c:
        print("%s-%s" % (element, elements[element]))
    print "\n"

produces the same 188 combinations as your post-processing code, but internally abandons a partial combination as soon as it would span more than max_colors colors. 产生与后处理代码相同的188种组合,但一旦跨越max_colors颜色,就会在内部放弃部分组合。 There's no way to change what itertools functions do internally, so when you want control over that, you need to roll your own. 没有办法在内部更改itertools函数的功能,所以当你需要控制它时,你需要自己动手。

Using itertools 使用itertools

Here's another approach, generating first all solutions with exactly 1 color, then exactly 2 colors, and so on. 这是另一种方法,首先生成1种颜色,然后是2种颜色的所有解决方案,依此类推。 itertools can be used directly for much of this, but at the lowest level still needs a custom generator. itertools可以直接用于其中的大部分,但在最低级别仍然需要自定义生成器。 I find this harder to understand than a fully custom generator, but it may be clearer to you: 我发现这比完全自定义的生成器更难理解,但对您来说可能更清楚:

def combs2(elt2color, combination_size=4, max_colors=3):
    from collections import defaultdict
    from itertools import combinations
    color2elts = defaultdict(list)
    for elt, color in elt2color.items():
        color2elts[color].append(elt)

    def at_least_one_from_each(iterables, n):
        if n < len(iterables):
            return # impossible
        if not n or not iterables:
            if not n and not iterables:
                yield ()
            return
        # Must have n - num_from_first >= len(iterables) - 1,
        # so num_from_first <= n - len(iterables) + 1
        for num_from_first in range(1, min(len(iterables[0]) + 1,
                                           n - len(iterables) + 2)):
            for from_first in combinations(iterables[0],
                                           num_from_first):
                for rest in at_least_one_from_each(iterables[1:],
                                             n - num_from_first):
                    yield from_first + rest

    for numcolors in range(1, max_colors + 1):
        for colors in combinations(color2elts, numcolors):
            # Now this gets tricky.  We need to pick
            # combination_size elements across all the colors, but
            # must pick at least one from each color.
            for elements in at_least_one_from_each(
                    [color2elts[color] for color in colors],
                    combination_size):
                yield elements

I haven't timed these, because I don't care ;-) The fully custom generator's single result list is reused for building each output, which slashes the rate of dynamic memory turnover. 我没有计时,因为我不关心;-)完全自定义生成器的单个result列表被重用于构建每个输出,这削减了动态内存周转率。 The second way creates a lot of memory churn by pasting together multiple levels of from_first and rest tuples - and that's mostly unavoidable because it uses itertools to generate the from_first tuples at each level. 第二种方法通过将from_firstrest元组的多个级别粘贴在一起来创建大量内存流失 - 这几乎是不可避免的,因为它使用itertools在每个级别生成from_first元组。

Internally, itertools functions almost always work in a way more similar to the first code sample, and for the same reasons, reusing an internal buffer as much as possible. 在内部, itertools函数几乎总是以与第一个代码示例更相似的方式工作,并且出于同样的原因,尽可能地重用内部缓冲区。

AND ONE MORE 还有一个

This is more to illustrate some subtleties. 这更多是为了说明一些细微之处。 I thought about what I'd do if I were to implement this functionality in C as an itertools function. 如果我要在C中实现这个功能作为itertools函数,我想到了我会做什么。 All the itertools functions were first prototyped in Python, but in a semi-low-level way, reduced to working with vectors of little integers (no "inner loop" usage of sets, dicts, sequence slicing, or pasting together partial result sequences - sticking as far as possible to O(1) worst-case time operations on dirt simple native C types after initialization). 所有的itertools函数都首先在Python中进行原型化,但是以半低级的方式,简化为使用小整数的向量(没有“内部循环”使用集合,dicts,序列切片或将部分结果序列粘贴在一起 -在初始化之后,尽可能坚持O(1)最简单的本机C类型的最坏情况时间操作。

At a higher level, an itertools function for this would accept any iterable as its primary argument, and almost certainly guarantee to return combinations from that in lexicographic index order. 在更高级别, itertools函数将接受任何iterable作为其主要参数,并且几乎肯定保证从字典索引顺序返回组合。 So here's code that does all that. 所以这里的代码完成了所有这些。 In addition to the iterable argument, it also requires an elt2ec mapping, which maps each element from the iterable to its equivalence class (for you, those are strings naming colors, but any objects usable as dict keys could be used as equivalence classes): 除了iterable参数之外,它还需要一个elt2ec映射,它将每个元素从iterable映射到它的等价类(对于你来说,这些是命名颜色的字符串,但是任何可用作dict键的对象可以用作等价类):

def combs3(iterable, elt2ec, k, maxec):
    # Generate all k-combinations from `iterable` spanning no
    # more than `maxec` equivalence classes.
    elts = tuple(iterable)
    n = len(elts)
    ec = [None] * n  # ec[i] is equiv class ordinal of elts[i]
    ec2j = {} # map equiv class to its ordinal
    for i, elt in enumerate(elts):
        thisec = elt2ec[elt]
        j = ec2j.get(thisec)
        if j is None:
            j = len(ec2j)
            ec2j[thisec] = j
        ec[i] = j
    countec = [0] * len(ec2j)
    del ec2j

    def inner(i, j, totalec):
        if i == k:
            yield result
            return
        for j in range(j, jbound[i]):
            thisec = ec[j]
            thiscount = countec[thisec]
            newtotalec = totalec + (thiscount == 0)
            if newtotalec <= maxec:
                countec[thisec] = thiscount + 1
                result[i] = j
                yield from inner(i+1, j+1, newtotalec)
                countec[thisec] = thiscount

    jbound = list(range(n-k+1, n+1))
    result = [None] * k
    for _ in inner(0, 0, 0):
         yield (elts[i] for i in result)

(Note that this is Python 3 code.) As advertised, nothing in inner() is fancier than indexing a vector with a little integer. (请注意,这是Python 3代码。)正如所宣传的那样, inner()任何内容都不比使用一个小整数索引一个向量更好。 The only thing remaining to make it directly translatable to C is removing the recursive generation. 使其可直接转换为C的唯一剩余部分是删除递归生成。 That's tedious, and since it wouldn't illustrate anything particularly interesting here I'm going to ignore that. 这很乏味,因为它不会说明这里特别有趣的东西,我会忽略它。

Anyway, the interesting thing is timing it. 无论如何,有趣的是计时。 As noted in a comment, timing results are strongly influenced by the test cases you use. 如评论中所述,时序结果受您使用的测试用例的强烈影响。 combs3() here is sometimes fastest, but not often! combs3()这里有时最快,但不经常! It's almost always faster than my original combs() , but usually slower than my combs2() or @GarethRees's lovely constrained_combinations() . 它几乎总是比我原来的combs()快,但通常比我的combs2()或@ GarethRees的可爱的constrained_combinations()慢。

So how can that be when combs3() has been optimized "almost all the way down to mindless ;-) C-level operations"? 那么,当combs3()被优化为“几乎一直到无意识;-) C级操作”时,怎么可能呢? Easy! 简单! It's still written in Python. 它仍然是用Python编写的。 combs2() and constrained_combinations() use the C-coded itertools.combinations() to do much of their work, and that makes a world of difference. combs2()constrained_combinations()使用C编码的itertools.combinations()来完成他们的大部分工作,这使得世界变得不同。 combs3() would run circles around them if it were coded in C. 如果用C编码, combs3()会在它们周围运行圆圈。

Of course any of these can run unboundedly faster than the allowed_combinations() in the original post - but that one can be fastest too (for example, pick just about any inputs where max_colors is so large that no combinations are excluded - then allowed_combinations() wastes little effort, while all these others add extra substantial extra overheads to "optimize" pruning that never occurs). 当然,其中任何一个都可以比原始帖子中的allowed_combinations()无限制地运行 - 但是那个也可以是最快的(例如,选择任何max_colors如此之大以至于不排除任何组合的输入 - 那么allowed_combinations()浪费了很少的努力,而所有这些都增加了额外的额外开销来“优化”从未发生过的修剪。

Rough outline. 粗糙的轮廓。

You have in total C different colors. 你总共有不同的颜色。 For each k, 1 <= k <= M , choose k colors in Bin(C,k) ways. 对于每个k, 1 <= k <= M ,以Bin(C,k)方式选择k种颜色。 (I'm using your notation here assuming Bin mean binomial coefficient). (我在这里使用你的符号假设Bin平均二项式系数)。

For each of the above choices, collect all the elements with the chosen colors. 对于上述每个选项,请使用所选颜色收集所有元素。 Let's say it gives P distinct elements. 让我们说它给出了P不同的元素。 Then choose L from these P elements in Bin(P, L) different ways. 然后从Bin(P, L)这些P元素中选择L不同的方式。

All of the above subject to obvious checks, M <= C , L <= P , etc. 以上所有都要进行明显的检查, M <= CL <= P等。

The advantage of this approach is that it will generate only valid combinations and every valid combination will be generated exactly once. 这种方法的优点是它只生成有效的组合,每个有效的组合将只生成一次。 (edit: and as pointed out in a comment, this is not true duplicate, combination can be generated). (编辑:并且如评论中所指出的,这不是真正的重复,可以生成组合)。

PS. PS。 And here's an implementation of the above algorithm, with the fix for duplicated combinations: 这里是上述算法的一个实现,修复了重复的组合:

from itertools import combinations


elts  = { 'A' : 'red', 'B' : 'red', 'C' : 'blue', 'D' : 'blue',
          'E': 'green', 'F' : 'green', 'G' : 'green', 'H' : 'yellow',
          'I' : 'white', 'J' : 'white', 'K' : 'black' }

def combs (elts, size = 4, max_colors = 3):
    # Count different colors
    colors = {}
    for e in elts.values():
        colors [e] = 1
    ncolors = len(colors)

    # for each different number of colors between 1 and 'max_colors' 
    for k in range (1, max_colors + 1):
        # Choose 'k' different colors
        for selected_colors in combinations (colors, k):
            # Select ell the elements with these colors
            selected_elts = []
            for e, c in elts.items():
                if c in selected_colors:
                    selected_elts.append (e)
            # Choose 'size' of these elements
            for chosen_elts in combinations (selected_elts, size):
                # Check the chosen elements are of exactly 'k' different colors
                t = {}
                for e in chosen_elts:
                    t[elts[e]] = 1
                if len(t) == k:
                    yield chosen_elts


#for e in combs (elts):
#    print (e)

print (len (list (combs (elts))))

PS. PS。 I also timed Tim's comb2 , my own comb and Gareth's constrained_combinations with the program here with these results: 我还定时蒂姆comb2 ,我自己的comb和Gareth的constrained_combinations与程序在这里与这些结果:

combs2 =  5.214529
constr combs = 5.290079
combs = 4.952063
combs2 = 5165700
constr combs = 5165700
combs = 5165700

使用 n 大小集合的最大 s 个元素(其中 s <n)< div><div id="text_translate"><p> 我知道如何使用 itertools 获得所有可能的替换组合,但我想通过使用较大集合中有限数量的元素来限制替换组合的数量。</p><p> 举个例子,我有一套</p><p>[0,1,2]</p><p> 我想获得带有替换的k组合(k = 4),但使用一组[0,1,2]中最多2个不同的元素</p><p>所以可以出现在每个组合中的元素集是:</p><pre> [0,1], [1,2], [0,2].</pre><p> 在这里,我也想避免重复组合,所以在这个例子中[0,0,0,0],[1,1,1,1]或[2,2,2,2]不应该重复。</p><p> 此示例的 output:</p><pre> [0,0,0,0] [0,0,0,1] [0,0,1,1] [0,1,1,1] [1,1,1,1] [1,1,1,2] [1,1,2,2] [1,2,2,2] [2,2,2,2] [0,0,0,2] [0,0,2,2] [0,2,2,2]</pre><p> 我希望我很清楚。 </p></div></n)<> - Generate k-combinations with replacement of n-size set using maximum s-number of elements of n-size set (where s<n)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 n 项的组合数 - Number of combinations of n terms 在列表中找到2 ^ n -2个元素的组合 - find 2^n -2 combinations of elements in a list 列出所有组合但有限制 - List all combinations but with restriction 构造1到n的四元组的所有可能组合的最有效方法 - Most efficient way to construct all possible combinations of a quadruple for 1 to n 使用 n 大小集合的最大 s 个元素(其中 s <n)< div><div id="text_translate"><p> 我知道如何使用 itertools 获得所有可能的替换组合,但我想通过使用较大集合中有限数量的元素来限制替换组合的数量。</p><p> 举个例子,我有一套</p><p>[0,1,2]</p><p> 我想获得带有替换的k组合(k = 4),但使用一组[0,1,2]中最多2个不同的元素</p><p>所以可以出现在每个组合中的元素集是:</p><pre> [0,1], [1,2], [0,2].</pre><p> 在这里,我也想避免重复组合,所以在这个例子中[0,0,0,0],[1,1,1,1]或[2,2,2,2]不应该重复。</p><p> 此示例的 output:</p><pre> [0,0,0,0] [0,0,0,1] [0,0,1,1] [0,1,1,1] [1,1,1,1] [1,1,1,2] [1,1,2,2] [1,2,2,2] [2,2,2,2] [0,0,0,2] [0,0,2,2] [0,2,2,2]</pre><p> 我希望我很清楚。 </p></div></n)<> - Generate k-combinations with replacement of n-size set using maximum s-number of elements of n-size set (where s<n) 组合的有效组合 - Efficient combinations of combinations 递归 - 具有 N 个子串限制的最长公共子序列 - Recursion - Longest Common Subsequence with Restriction on N number of substrings 查找N×N网格中的路线组合数量的代码 - Code to find number of route-combinations in a N by N grid 在 N 个图像中分割图像,其中 N 是出现在其上的 colors 的数量 - Split image in N images, where N is the number of colors appearing on it 获取长度为 n 的两个元素的所有组合的列表 - Obtain a list of all combinations of two elements with length n
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM