Numpy 随机选择的概率生成具有唯一行的二维数组

Question

Similar to Numpy random choice to produce a 2D-array with all unique values , I am looking for an efficient way of generating:类似于Numpy random selection 以生成具有所有唯一值的二维数组，我正在寻找一种有效的生成方式：

n = 1000
k = 10
number_of_combinations = 1000000

p = np.random.rand(n)
p /= np.sum(p)

my_combinations = np.random.choice(n, size=(number_of_combinations, k), replace=False, p=p)

As discussed in the previous question, I want this matrix to have only unique rows.正如在上一个问题中所讨论的，我希望这个矩阵只有唯一的行。 Unfortunately, the provided solutions do not work for the additional extension of using specific probabilities p.不幸的是，提供的解决方案不适用于使用特定概率 p 的额外扩展。

My current solution is as follows:我目前的解决方案如下：

my_combinations = set()

while len(my_combinations) < number_of_combinations:
    new_combination = np.random.choice(n, size=k, replace=False, p=p)
    my_combinations.add(frozenset(new_combination))

print(my_combinations)

However, I do think that there should be a more efficient numpy approach to solve this faster.但是，我确实认为应该有一种更有效的 numpy 方法来更快地解决这个问题。

Answer 1

For these parameter values, the probability of encountering a duplicate row is astronomically small (unless p is very skewed, perhaps to the extent that cannot be accommodated by float precision).对于这些参数值，遇到重复行的概率是天文数字（除非p非常偏斜，可能到了浮点精度无法容纳的程度）。 I would just use我只会用

my_combinations = np.random.choice(n, size=number_of_combinations, k), replace=True, p=p)

You can check for duplicates in O(N log N) where N = number_of_combinations ;您可以检查O(N log N)中的重复项，其中N = number_of_combinations ；

Conservatively, you could generate保守地，你可以生成

my_combinations = np.random.choice(n, size=2 * number_of_combinations, k), replace=True, p=p)

then drop duplicates and take the first number_of_combinations rows.然后删除重复项并取第一个number_of_combinations行。

Numpy 随机选择的概率生成具有唯一行的二维数组

问题描述

1 个解决方案

解决方案1
2 2019-08-08 09:34:44

Numpy 随机选择的概率生成具有唯一行的二维数组

问题描述

1 个解决方案

解决方案1 2 2019-08-08 09:34:44

解决方案1
2 2019-08-08 09:34:44