简体   繁体   English

Numpy 随机选择的概率生成具有唯一行的二维数组

[英]Numpy random choice with probabilities to produce a 2D-array with unique rows

Similar to Numpy random choice to produce a 2D-array with all unique values , I am looking for an efficient way of generating:类似于Numpy random selection 以生成具有所有唯一值的二维数组,我正在寻找一种有效的生成方式:

n = 1000
k = 10
number_of_combinations = 1000000

p = np.random.rand(n)
p /= np.sum(p)

my_combinations = np.random.choice(n, size=(number_of_combinations, k), replace=False, p=p)

As discussed in the previous question, I want this matrix to have only unique rows.正如在上一个问题中所讨论的,我希望这个矩阵只有唯一的行。 Unfortunately, the provided solutions do not work for the additional extension of using specific probabilities p.不幸的是, 提供的解决方案不适用于使用特定概率 p 的额外扩展。

My current solution is as follows:我目前的解决方案如下:

my_combinations = set()

while len(my_combinations) < number_of_combinations:
    new_combination = np.random.choice(n, size=k, replace=False, p=p)
    my_combinations.add(frozenset(new_combination))

print(my_combinations)

However, I do think that there should be a more efficient numpy approach to solve this faster.但是,我确实认为应该有一种更有效的 numpy 方法来更快地解决这个问题。

For these parameter values, the probability of encountering a duplicate row is astronomically small (unless p is very skewed, perhaps to the extent that cannot be accommodated by float precision).对于这些参数值,遇到重复行的概率是天文数字(除非p非常偏斜,可能到了浮点精度无法容纳的程度)。 I would just use我只会用

my_combinations = np.random.choice(n, size=number_of_combinations, k), replace=True, p=p)

You can check for duplicates in O(N log N) where N = number_of_combinations ;您可以检查O(N log N)中的重复项,其中N = number_of_combinations

Conservatively, you could generate保守地,你可以生成

my_combinations = np.random.choice(n, size=2 * number_of_combinations, k), replace=True, p=p)

then drop duplicates and take the first number_of_combinations rows.然后删除重复项并取第一个number_of_combinations行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM