[英]Numpy random choice with probabilities to produce a 2D-array with unique rows
Similar to Numpy random choice to produce a 2D-array with all unique values , I am looking for an efficient way of generating:类似于Numpy random selection 以生成具有所有唯一值的二维数组,我正在寻找一种有效的生成方式:
n = 1000
k = 10
number_of_combinations = 1000000
p = np.random.rand(n)
p /= np.sum(p)
my_combinations = np.random.choice(n, size=(number_of_combinations, k), replace=False, p=p)
As discussed in the previous question, I want this matrix to have only unique rows.正如在上一个问题中所讨论的,我希望这个矩阵只有唯一的行。 Unfortunately, the provided solutions do not work for the additional extension of using specific probabilities p.
不幸的是, 提供的解决方案不适用于使用特定概率 p 的额外扩展。
My current solution is as follows:我目前的解决方案如下:
my_combinations = set()
while len(my_combinations) < number_of_combinations:
new_combination = np.random.choice(n, size=k, replace=False, p=p)
my_combinations.add(frozenset(new_combination))
print(my_combinations)
However, I do think that there should be a more efficient numpy approach to solve this faster.但是,我确实认为应该有一种更有效的 numpy 方法来更快地解决这个问题。
For these parameter values, the probability of encountering a duplicate row is astronomically small (unless p
is very skewed, perhaps to the extent that cannot be accommodated by float precision).对于这些参数值,遇到重复行的概率是天文数字(除非
p
非常偏斜,可能到了浮点精度无法容纳的程度)。 I would just use我只会用
my_combinations = np.random.choice(n, size=number_of_combinations, k), replace=True, p=p)
You can check for duplicates in O(N log N)
where N = number_of_combinations
;您可以检查
O(N log N)
中的重复项,其中N = number_of_combinations
;
Conservatively, you could generate保守地,你可以生成
my_combinations = np.random.choice(n, size=2 * number_of_combinations, k), replace=True, p=p)
then drop duplicates and take the first number_of_combinations
rows.然后删除重复项并取第一个
number_of_combinations
行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.