简体   繁体   English

生成n个区域中k球的所有可能结果(多项式/分类结果的总和)

[英]Generate all possible outcomes of k balls in n bins (sum of multinomial / categorical outcomes)

Suppose we have n bins in which we are throwing k balls. 假设我们有n投掷k球的箱子。 What is a fast (ie using numpy/scipy instead of python code) way to generate all possible outcomes as a matrix? 什么是快速 (即使用numpy / scipy而不是python代码)方式来生成所有可能的结果作为矩阵?

For example, if n = 4 and k = 3 , we'd want the following numpy.array : 例如,如果n = 4k = 3 ,我们需要以下numpy.array

3 0 0 0
2 1 0 0
2 0 1 0
2 0 0 1
1 2 0 0
1 1 1 0
1 1 0 1
1 0 2 0
1 0 1 1
1 0 0 2
0 3 0 0
0 2 1 0
0 2 0 1
0 1 2 0
0 1 1 1
0 1 0 2
0 0 3 0
0 0 2 1
0 0 1 2
0 0 0 3

Apologies if any permutation was missed, but this is the general idea. 如果错过任何排列,请道歉,但这是一般的想法。 The generated permutations don't have to be in any particular order, but the above list was convenient for categorically iterating through them mentally. 生成的排列不必具有任何特定的顺序,但上述列表便于在心理上明确地迭代它们。

Better yet, is there a way to map every integer from 1 to the multiset number (the cardinality of this list) directly to a given permutation? 更好的是,有没有办法将每个从1的整数映射到多重编号 (此列表的基数)直接映射到给定的排列?

This question is related to the following ones, which are implemented in R with very different facilities: 这个问题与以下问题有关,这些问题在R中实现,具有非常不同的设施:

Also related references: 还有相关参考:

Here's a generator solution using itertools.combinations_with_replacement , don't know if it will be suitable for your needs. 这是使用itertools.combinations_with_replacement的生成器解决方案,不知道它是否适合您的需求。

def partitions(n, b):
    masks = numpy.identity(b, dtype=int)
    for c in itertools.combinations_with_replacement(masks, n): 
        yield sum(c)

output = numpy.array(list(partitions(3, 4)))
# [[3 0 0 0]
#  [2 1 0 0]
#  ...
#  [0 0 1 2]
#  [0 0 0 3]]

The complexity of this function grows exponentially, so there is a discrete boundary between what is feasible and what is not. 这个函数的复杂性呈指数级增长,因此在可行和不可行之间存在着一个独立的边界。

Note that while numpy arrays need to know their size at construction, this is easily possible since the multiset number is easily found. 请注意,虽然numpy数组需要在构造时知道它们的大小,但这很容易实现,因为很容易找到多重数字。 Below might be a better method, I have done no timings. 下面可能是一个更好的方法,我没有做任何时间。

from math import factorial as fact
from itertools import combinations_with_replacement as cwr

nCr = lambda n, r: fact(n) / fact(n-r) / fact(r)

def partitions(n, b):
    partition_array = numpy.empty((nCr(n+b-1, b-1), b), dtype=int)
    masks = numpy.identity(b, dtype=int)
    for i, c in enumerate(cwr(masks, n)): 
        partition_array[i,:] = sum(c)
    return partition_array

For reference purposes, the following code uses Ehrlich's algorithm to iterate through all possible combinations of a multiset in C++, Javascript, and Python: 出于参考目的,以下代码使用Ehrlich算法迭代C ++,Javascript和Python中多集的所有可能组合:

https://github.com/ekg/multichoose https://github.com/ekg/multichoose

This can be converted to the above format using this method . 可以使用此方法将其转换为上述格式。 Specifically, 特别,

for s in multichoose(k, set):
    row = np.bincount(s, minlength=len(set) + 1)

This still isn't pure numpy, but can be used to fill a preallocated numpy.array pretty quickly. 这仍然不是纯粹的numpy,但可以用来填充预分配的numpy.array很快。

here is a naive implementation with list comprehensions, not sure about performance compared to numpy 这是一个带有列表推导的天真实现,与numpy相比,不确定性能

def gen(n,k):
    if(k==1):
        return [[n]]
    if(n==0):
        return [[0]*k]
    return [ g2 for x in range(n+1) for g2 in [ u+[n-x] for u in gen(x,k-1) ] ]

> gen(3,4)
[[0, 0, 0, 3],
 [0, 0, 1, 2],
 [0, 1, 0, 2],
 [1, 0, 0, 2],
 [0, 0, 2, 1],
 [0, 1, 1, 1],
 [1, 0, 1, 1],
 [0, 2, 0, 1],
 [1, 1, 0, 1],
 [2, 0, 0, 1],
 [0, 0, 3, 0],
 [0, 1, 2, 0],
 [1, 0, 2, 0],
 [0, 2, 1, 0],
 [1, 1, 1, 0],
 [2, 0, 1, 0],
 [0, 3, 0, 0],
 [1, 2, 0, 0],
 [2, 1, 0, 0],
 [3, 0, 0, 0]]

Here's the solution I came up with for this. 这是我想出的解决方案。

import numpy, itertools
def multinomial_combinations(n, k, max_power=None):
    """returns a list (2d numpy array) of all length k sequences of 
    non-negative integers n1, ..., nk such that n1 + ... + nk = n."""
    bar_placements = itertools.combinations(range(1, n+k), k-1)
    tmp = [(0,) + x + (n+k,) for x in bar_placements]
    sequences =  numpy.diff(tmp) - 1
    if max_power:
        return sequences[numpy.where((sequences<=max_power).all(axis=1))][::-1]
    else:
        return sequences[::-1]

Note 1: The [::-1] at the end just reverses the order to match your example output. 注1:最后的[:: - 1]只是反转顺序以匹配您的示例输出。

Note 2: Finding these sequences is equivalent to finding all ways to arrange n stars and k-1 bars in (to fill n+k-1 spots) (see stars and bars thm 2 ). 注2:找到这些序列相当于找到排列n个星和k-1条的所有方法(填充n + k-1个点)(参见星号和条形图2 )。

Note 3: The max_power argument is to give you the option to return only sequences where all integers are below some max. 注3:max_power参数用于为您提供仅返回所有整数低于某个最大值的序列的选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM