简体   繁体   English

在 python/numpy 中生成 ~10^9 泊松随机数的最快方法

[英]Fastest way to generate ~10^9 poisson random numbers in python/numpy

I would like to find the fastest way to generate ~10^9 poisson random numbers in python/numpy—for instance, say I have a mean Poisson parameter (calculated elsewhere) of shape (1000, 2000), and I need 500 independent samples.我想找到在 python/numpy 中生成 ~10^9 泊松随机数的最快方法——例如,假设我有一个形状为 (1000, 2000) 的平均泊松参数(在别处计算),我需要 500 个独立样本. This is a bottleneck in my code, taking several minutes to complete.这是我的代码中的一个瓶颈,需要几分钟才能完成。 I have tried three methods, but am looking for something faster:我尝试了三种方法,但正在寻找更快的方法:

import numpy as np

# example parameters
nsamples = 500
nmeas = 2000
ninputs = 1000
lambdax = np.ones([ninputs, nmeas]) * 20

# numpy, one big array
sample0 = np.random.poisson(lam=lambdax, size=(nsamples, ninputs, nmeas))

# numpy, current version where other code happens in the loop
sample1 = np.zeros([nsamples, ninputs, nmeas])
for i in range(nsamples):
    sample1[i, :, :] = np.random.poisson(lam=lambdax)

# scipy
from scipy.stats import poisson
sample2 = poisson.rvs(lambdax, size=(nsamples, ninputs, nmeas))

Results:结果:

sample0: 1 m 16 s
sample1: 1 m 20 s
sample2: 1 m 50 s

Not shown here, I am also parallelizing the independent samples via multiprocessing , but the calculations are still pretty expensive for such large parameters.此处未显示,我还通过multiprocessing并行化独立样本,但对于如此大的参数,计算仍然相当昂贵。 Is there a better way?有没有更好的办法?

I have been in your shoes and here are my suggestions:我一直站在你的立场上,以下是我的建议:

  • For large mean values, poisson works similar to uniform.对于较大的平均值,泊松的工作原理类似于均匀。 check out this post (and probably more if you search).看看这篇文章(如果你搜索的话,可能会更多)。
  • ~1m runtime seems reasonable to generate such a large number of random numbers.大约 1m 的运行时间似乎可以生成如此大量的随机数。 I don't think you can top sample0 method by much via just coding.我不认为仅仅通过编码你就可以sample0方法。 Now depending on what you want to do with random numbers,现在取决于你想用随机数做什么,
    • if your issue is rerunning program multiple times, try saving sample0 into a file and reloading it in the next runs.如果您的问题是多次重新运行程序,请尝试将sample0保存到文件中并在下次运行时重新加载。
    • if not, I suggest creating lower number of randoms and reuse them.如果没有,我建议创建较少数量的随机数并重复使用它们。 A lot of those random numbers in sample0 will be repeated in your sample, depending on your mean value. sample0中的许多随机数将在您的样本中重复,具体取决于您的平均值。 You might want to create smaller sample size and randomly choose from them.您可能希望创建较小的样本量并从中随机选择。 for example I would chose a random number from sample0 and reuse it for eg 100 times (since that number would appear in sample0 over 100 times anyways).例如,我会从sample0中选择一个随机数并将其重复使用 100 次(因为该数字无论如何都会在sample0中出现超过 100 次)。

If you provide more information on what you intend to do with random numbers, we might be able to help more.如果您提供有关您打算如何处理随机数的更多信息,我们可能会提供更多帮助。 Otherwise, coding-wise I am not sure if you can do much further.否则,就编码而言,我不确定您是否可以做得更多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM