简体   繁体   English

如何创建一个总和为x的随机整数向量列表

[英]How to create a list of random integer vector whose sum is x

Creating a random vector whose sum is X (eg X=1000) is fairly straight forward: 创建一个总和为X的随机向量(例如X = 1000)非常简单:

import random
def RunFloat():
    Scalar = 1000
    VectorSize = 30
    RandomVector = [random.random() for i in range(VectorSize)]
    RandomVectorSum = sum(RandomVector)
    RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
    return RandomVector
RunFloat()

The code above create a vector whose values are floats and sum is 1000. 上面的代码创建了一个向量,其值为浮点数,sum为1000。

I'm having difficulty creating a simple function for creating a vector whose values are integers and sum is X (eg X=1000*30) 我很难创建一个简单的函数来创建一个值为整数且和为X的向量(例如X = 1000 * 30)

import random
def RunInt():
    LowerBound = 600
    UpperBound = 1200
    VectorSize = 30
    RandomVector = [random.randint(LowerBound,UpperBound) for i in range(VectorSize)]
    RandomVectorSum = 1000*30
    #Sanity check that our RandomVectorSum is sensible/feasible
    if LowerBound*VectorSize <= RandomVectorSum and RandomVectorSum <= UpperBound*VectorSum:
        if sum(RandomVector) == RandomVectorSum:
            return RandomVector
        else:
            RunInt()  

Does anyone have any suggestions to improve on this idea? 有没有人有任何改进这个想法的建议? My code might never finish or run into recursion depth problems. 我的代码可能永远不会完成或遇到递归深度问题。

Edit (July 9, 2012) 编辑(2012年7月9日)

Thanks to Oliver, mgilson, and Dougal for their inputs. 感谢Oliver,mgilson和Dougal的投入。 My solution is shown below. 我的解决方案如下所示。

  1. Oliver was very creative with the multinomial distribution idea Oliver对多项分发理念非常有创意
  2. Put simply, (1) is very likely to output certain solutions more so than others. 简而言之,(1)很可能比其他解决方案更能输出某些解决方案。 Dougal demonstrated that the multinomial solution space distribution is not uniform or normal by a simple test/counter example of Law of Large Numbers. Dougal通过大数定律的简单测试/反例证明了多项式解空间分布不均匀或正常。 Dougal also suggested to use numpy's multinomial function which saves me a lot of trouble, pain, and headaches. Dougal还建议使用numpy的多项功能,这可以为我节省很多麻烦,痛苦和头痛。
  3. To overcome (2)'s output issue, I use RunFloat() to give what appears (I haven't tested this so its just a superficial appearance) to be a more uniform distribution. 为了克服(2)的输出问题,我使用RunFloat()来显示出现的内容(我没有测试过,因此它只是一个肤浅的外观)是一个更均匀的分布。 How much of a difference does this make compared to (1)? 与(1)相比,这有多大差异? I don't really know off-hand. 我真的不知道副手。 It's good enough for my use though. 这对我的使用来说已经足够了。
  4. Thanks again to mgilson for the alternative method that does not use numpy. 再次感谢mgilson为替代方法,不使用numpy。

Here is the code that I have made for this edit: 这是我为此编辑所做的代码:

Edit #2 (July 11,2012) 编辑#2(2012年7月11日)

I realized that the normal distribution is not correctly implemented, I have since modified it to the following: 我意识到正常分布没有正确实现,我已经将其修改为以下内容:

import random
def RandFloats(Size):
    Scalar = 1.0
    VectorSize = Size
    RandomVector = [random.random() for i in range(VectorSize)]
    RandomVectorSum = sum(RandomVector)
    RandomVector = [Scalar*i/RandomVectorSum for i in RandomVector]
    return RandomVector

from numpy.random import multinomial
import math
def RandIntVec(ListSize, ListSumValue, Distribution='Normal'):
    """
    Inputs:
    ListSize = the size of the list to return
    ListSumValue = The sum of list values
    Distribution = can be 'uniform' for uniform distribution, 'normal' for a normal distribution ~ N(0,1) with +/- 5 sigma  (default), or a list of size 'ListSize' or 'ListSize - 1' for an empirical (arbitrary) distribution. Probabilities of each of the p different outcomes. These should sum to 1 (however, the last element is always assumed to account for the remaining probability, as long as sum(pvals[:-1]) <= 1).  
    Output:
    A list of random integers of length 'ListSize' whose sum is 'ListSumValue'.
    """
    if type(Distribution) == list:
        DistributionSize = len(Distribution)
        if ListSize == DistributionSize or (ListSize-1) == DistributionSize:
            Values = multinomial(ListSumValue,Distribution,size=1)
            OutputValue = Values[0]
    elif Distribution.lower() == 'uniform': #I do not recommend this!!!! I see that it is not as random (at least on my computer) as I had hoped
        UniformDistro = [1/ListSize for i in range(ListSize)]
        Values = multinomial(ListSumValue,UniformDistro,size=1)
        OutputValue = Values[0]
    elif Distribution.lower() == 'normal':
        """
        Normal Distribution Construction....It's very flexible and hideous
        Assume a +-3 sigma range.  Warning, this may or may not be a suitable range for your implementation!
        If one wishes to explore a different range, then changes the LowSigma and HighSigma values
        """
        LowSigma    = -3#-3 sigma
        HighSigma   = 3#+3 sigma
        StepSize    = 1/(float(ListSize) - 1)
        ZValues     = [(LowSigma * (1-i*StepSize) +(i*StepSize)*HighSigma) for i in range(int(ListSize))]
        #Construction parameters for N(Mean,Variance) - Default is N(0,1)
        Mean        = 0
        Var         = 1
        #NormalDistro= [self.NormalDistributionFunction(Mean, Var, x) for x in ZValues]
        NormalDistro= list()
        for i in range(len(ZValues)):
            if i==0:
                ERFCVAL = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
                NormalDistro.append(ERFCVAL)
            elif i ==  len(ZValues) - 1:
                ERFCVAL = NormalDistro[0]
                NormalDistro.append(ERFCVAL)
            else:
                ERFCVAL1 = 0.5 * math.erfc(-ZValues[i]/math.sqrt(2))
                ERFCVAL2 = 0.5 * math.erfc(-ZValues[i-1]/math.sqrt(2))
                ERFCVAL = ERFCVAL1 - ERFCVAL2
                NormalDistro.append(ERFCVAL)  
            #print "Normal Distribution sum = %f"%sum(NormalDistro)
            Values = multinomial(ListSumValue,NormalDistro,size=1)
            OutputValue = Values[0]
        else:
            raise ValueError ('Cannot create desired vector')
        return OutputValue
    else:
        raise ValueError ('Cannot create desired vector')
    return OutputValue
#Some Examples        
ListSize = 4
ListSumValue = 12
for i in range(100):
    print RandIntVec(ListSize, ListSumValue,Distribution=RandFloats(ListSize))

The code above can be found on github . 上面的代码可以在github上找到。 It is part of a class I built for school. 这是我为学校建造的课程的一部分。 user1149913, also posted a nice explanation of the problem. user1149913,也发布了一个很好的解释问题。

I would suggest not doing this recursively: 我建议不要递归这样做:

When you sample recursively, the value from the first index has a much greater possible range, whereas values in subsequent indices will be constrained by the first value. 递归采样时,第一个索引的值具有更大的可能范围,而后续索引中的值将受第一个值约束。 This will yield something resembling an exponential distribution . 这将产生类似于指数分布的东西。

Instead, what I'd recommend is sampling from the multinomial distribution . 相反,我建议的是从多项分布中抽样。 This will treat each index equally, constrain the sum, force all values to be integers, and sample uniformly from all possible configurations that follow these rules (note: configurations that can happen multiple ways will be weighted by the number of ways that they can occur). 这将平等对待每个索引,约束总和,强制所有值为整数,并从遵循这些规则的所有可能配置中均匀地进行采样(注意:可能以多种方式发生的配置将根据它们可能发生的方式的数量进行加权)。

To help merge your question with the multinomial notation, total sum is n (an integer), and so each of the k values (one for each index, also integers) must be between 0 and n. 为了帮助将您的问题与多项式表示法合并,总和为n(整数),因此每个k值(每个索引一个,也是整数)必须介于0和n之间。 Then follow the recipe here . 然后按照这里的食谱。

(Or use numpy.random.multinomial as @Dougal helpfully suggested). (或者使用numpy.random.multinomial作为@Dougal帮助建议)。

I just ran both @Oliver's multinomial approach and @mgilson's code a million times each, for a length-3 vector summing to 10, and looked at the number of times each possible outcome came up. 我只是将@ Oliver的多项式方法@ mgilson的代码分别运行了100万次,长度为3的向量总和为10,并查看了每种可能结果出现的次数。 Both are extremely nonuniform: 两者都非常不均匀:

(I'm about to show the indexing approach.) (我即将展示索引方法。)

Does this matter? 这有关系吗? Depends on whether you want "an arbitrary vector with this property that's usually different each time" vs each valid vector being equally likely. 取决于你是否想要“具有此属性的任意向量,每次通常不同”,而每个有效向量同样可能。

In the multinomial approach, of course 3 3 4 is going to be much more likely than 0 0 10 (4200 times more likely, as it turns out). 在多项式方法中,当然3 3 4将比0 0 10更可能(事实证明,可能性为4200倍)。 mgilson's biases are less obvious to me, but 0 0 10 and its permutations were the least likely by far (only ~750 times each out of a million); mgilson的偏见对我来说不那么明显,但到目前为止0 0 10和它的排列是最不可能的(每百万只中只有750次); the most common were 1 4 5 and its permutations; 最常见的是1 4 5及其排列; not sure why, but they were certainly the most common, followed by 1 3 6 . 不知道为什么,但它们肯定是最常见的,其次是1 3 6 It'll typically start with a sum that's too high in this configuration (expectation 15), though I'm not sure why the reduction works out that way.... 它通常以这种配置中的总和(期望值15)开始,但我不确定为什么减少会以这种方式运行....

One way to get a uniform output over the possible vectors would be a rejection scheme. 在可能的向量上获得统一输出的一种方法是拒绝方案。 To get a vector of length K with sum N , you'd: 要获得长度为K且总和为N的向量,您需要:

  1. Sample a vector of length K with integer elements uniformly and independently between 0 and N . 0N之间均匀且独立地对具有整数元素的长度为K的向量进行采样。
  2. Repeat until the sum of the vector is N . 重复,直到矢量之和为N

Obviously this is going to be extremely slow for non-tiny K and N . 显然,对于非微小的KN来说,这将是非常缓慢的。

Another approach would be to assign a numbering to all the possible vectors; 另一种方法是为所有可能的向量分配编号; there are (N + K - 1) choose (K - 1) such vectors, so just choose a random integer in that range to decide which one you want. (N + K - 1) choose (K - 1)这样的向量,所以只需选择该范围内的随机整数来决定你想要的那个。 One reasonable way to number them is lexicographic ordering: (0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), ... . 对它们进行编号的一种合理方式是词典排序: (0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), ... 0,0,10 (0, 0, 10), (0, 1, 9), (0, 2, 8), (0, 3, 7), ...

Note that the last ( K th) element of the vector is uniquely determined by the sum of the first K-1 . 注意,矢量的最后(第K个)元素由第一个K-1的总和唯一确定。

I'm sure there's a nice way to immediately jump to whatever index in this list, but I can't think of it right now....enumerating the possible outcomes and walking over them will work, but will probably be slower than necessary. 我确信有一个很好的方法可以立即跳转到此列表中的任何索引,但我现在无法想到它......枚举可能的结果并遍历它们将起作用,但可能会比必要的慢。 Here's some code for that (though we actually use reverse lexicographic ordering here...). 这里有一些代码(虽然我们实际上在这里使用反向词典排序......)。

from itertools import islice, combinations_with_replacement
from functools import reduce
from math import factorial
from operator import mul
import random

def _enum_cands(total, length):
    # get all possible ways of choosing 10 of our indices
    # for example, the first one might be  0000000000
    # meaning we picked index 0 ten times, for [10, 0, 0]
    for t in combinations_with_replacement(range(length), 10):
        cand = [0] * length
        for i in t:
            cand[i] += 1
        yield tuple(cand)

def int_vec_with_sum(total, length):
    num_outcomes = reduce(mul, range(total + 1, total + length)) // factorial(length - 1)
    # that's integer division, even though SO thinks it's a comment :)
    idx = random.choice(range(num_outcomes))
    return next(islice(_enum_cands(total, length), idx, None))

As shown in the histogram above, this is actually uniform over possible outcomes. 如上面的直方图所示,这实际上是可能结果的统一。 It's also easily adaptable to upper/lower bounds on any individual element; 它也很容易适应任何单个元素的上/下界限; just add the condition to _enum_cands . 只需将条件添加到_enum_cands

This is slower than either of the other answers: for sum 10 length 3, I get 这比其他任何一个答案要慢:对于总和10长度3,我得到

  • 14.7 us using np.random.multinomial , 14.7我们使用np.random.multinomial
  • 33.9 us using mgilson's, 33.9我们使用mgilson's,
  • 88.1 us with this approach 88.1我们采用这种方法

I'd expect that the difference would get worse as the number of possible outcomes increases. 我预计随着可能结果的数量增加,差异会变得更糟。

If someone comes up with a nifty formula for indexing into these vectors somehow, it'd be much better.... 如果有人想出一个漂亮的公式,以某种方式索引这些向量,它会好得多....

Here's a pretty straight forward implementation. 这是一个非常直接的实现。

import random
import math

def randvec(vecsum, N, maxval, minval):
    if N*minval > vecsum or N*maxval < vecsum:
        raise ValueError ('Cannot create desired vector')

    indices = list(range(N))
    vec = [random.randint(minval,maxval) for i in indices]
    diff = sum(vec) - vecsum # we were off by this amount.

    #Iterate through, incrementing/decrementing a random index 
    #by 1 for each value we were off.
    while diff != 0:  
        addthis = 1 if diff > 0 else -1 # +/- 1 depending on if we were above or below target.
        diff -= addthis

        ### IMPLEMENTATION 1 ###
        idx = random.choice(indices) # Pick a random index to modify, check if it's OK to modify
        while not (minval < (vec[idx] - addthis) < maxval):  #operator chaining.  If you don't know it, look it up.  It's pretty cool.
            idx = random.choice(indices) #Not OK to modify.  Pick another.

        vec[idx] -= addthis #Update that index.

        ### IMPLEMENTATION 2 ###
        # random.shuffle(indices)
        # for idx in indices:
        #    if minval < (vec[idx] - addthis) < maxval:
        #        vec[idx]-=addthis
        #        break
        #
        # in situations where (based on choices of N, minval, maxval and vecsum)
        # many of the values in vec MUST BE minval or maxval, Implementation 2
        # may be superior.

    return vec

a = randvec(1000,20,100,1)
print sum(a)

The most efficient way to sample uniformly from the set of partitions of N elements into K bins is to use a dynamic programming algorithm, which is O(KN). 从N个元素的分区集合均匀地采样到K个bin的最有效方式是使用动态编程算法,即O(KN)。 There are a multichoose (http://mathworld.wolfram.com/Multichoose.html) number of possibilities, so enumerating every one will be very slow. 有多种可能性,所以枚举每一种都很慢。 Rejection sampling and other monte-carlo methods will also likely be very slow. 拒绝抽样和其他蒙特卡罗方法也可能非常缓慢。

Other methods people propose, like sampling from a multinomial do not draw samples from a uniform distribution. 人们提出的其他方法,如从多项式采样,不会从均匀分布中提取样本。

Let T(n,k) be the number of partitions of n elements into k bins, then we can compute the recurrence 设T(n,k)为n个元素到k个bin的分区数,然后我们可以计算递推

T(n,1)=1 \forall n>=0
T(n,k)=\sum_{m<=n} T(n-m,k-1)

To sample K elements that sum to N, sample from K multinomial distributions going "backward" in the recurrence: Edit: The T's in the multinomial's below should be normalized to sum to one before drawing each sample. 为了对总和为N的K个元素进行采样,来自K个多项式分布的样本在重复中“向后”: 编辑:在绘制每个样本之前,应将下面多项式中的T归一化为1。

n1 = multinomial([T(N,K-1),T(N-1,K-1),...,T(0,K-1)])
n2 = multinomial([T(N-n1,K-1),T(N-n1-1,K-1),...,T(0,K-1)])
...
nK = multinomial([T(N-sum([n1,...,n{k-1}]),1),T(N-sum([n1,...,n{k-1}])-1,1),...,T(0,1)])

Note: I am allowing 0's to be sampled. 注意:我允许对0进行采样。

This procedure is similar to sampling a set of hidden state from a segmental semi-markov model (http://www.gatsby.ucl.ac.uk/%7Echuwei/paper/icml103.pdf). 此过程类似于从分段半马尔可夫模型中采样一组隐藏状态(http://www.gatsby.ucl.ac.uk/%7Echuwei/paper/icml103.pdf)。

This version will give a uniform distribution: 此版本将提供统一分布:

from random import randint

def RunInt(VectorSize, Sum):
   x = [randint(0, Sum) for _ in range(1, VectorSize)]
   x.extend([0, Sum])
   x.sort()
   return [x[i+1] - x[i] for i in range(VectorSize)]

Just to give you another approach, implement a partition_function(X) and randomly choose a number between 0 and the length of partition_function(1000) and there you have it. 只是为了给你另一种方法,实现一个partition_function(X)并随机选择一个介于0和partition_function(1000)长度之间的数字,你有它。 Now you just need to find an efficient way to calculate a partition function. 现在,您只需要找到一种有效的方法来计算分区函数。 These links might help: 这些链接可能有所帮助

http://code.activestate.com/recipes/218332-generator-for-integer-partitions/ http://code.activestate.com/recipes/218332-generator-for-integer-partitions/

http://oeis.org/A000041 http://oeis.org/A000041

EDIT: Here is a simple code: 编辑: 这是一个简单的代码:

import itertools
import random
all_partitions = {0:set([(0,)]),1:set([(1,)])}

def partition_merge(a,b):
    c = set()
    for t in itertools.product(a,b):
        c.add(tuple(sorted(list(t[0]+t[1]))))
    return c

def my_partition(n):
    if all_partitions.has_key(n):
        return all_partitions[n]
    a = set([(n,)])
    for i in xrange(1,n/2+1):
        a = partition_merge(my_partition(i),my_partition(n-i)).union(a)
    all_partitions[n] = a
    return a

if __name__ == '__main__':
    n = 30
    # if you have a few years to wait uncomment the next line
    # n = 1000
    a = my_partition(n)
    i = random.randint(0,len(a)-1)
    print(list(a)[i])

What with: 用什么:

import numpy as np
def RunInt(VectorSize, Sum):
    a = np.array([np.random.rand(VectorSize)])
    b = np.floor(a/np.sum(a)*Sum) 
    for i in range(int(Sum-np.sum(b))):
        b[0][np.random.randint(len(b[0]))] += 1
    return b[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获得总和等于M的N个随机整数 - How to get N random integer numbers whose sum is equal to M python - 如何使用随机打印总和小于或等于另一个整数的整数? - How to use random to print integers whose sum is less than or equal to another integer in Python? 生成具有固定和的随机 integer numpy 向量 - Generating random integer numpy vector with a fixed sum 如何使用随机 function 创建一个数组,其总和为 1 在 python 中使用 nympy? - How can I create a array using the random function whose sum is 1 in python using nympy? 将随机 integer 值添加到列表中的元素,但列表的总和不得更改 - Add random integer value to elements in list, but sum of list must not change 如何在列表中找到元素总和最大的列表? - How to find the list in a list of lists whose sum of elements is the greatest? 使用总和为常数(C)的N个随机数创建类似于Poisson的分布 - Create Poisson-like distribution with N random numbers whose sum is a constant (C) 如何打印列表的元素,其总和等于 python 中的给定数字? - how to print elements of a list whose sum is equal to a given number in python? 随机 numpy 数组,其值介于 -1 和 1 之间且总和为 1 - random numpy array whose values are between -1 and 1 and sum to 1 生成两个平方和== 1的随机数 - Generate two random numbers whose square sum ==1
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM