简体   繁体   English

将 random.random 均匀转换为指数分布不会产生正确的结果

[英]transforming random.random uniform to exponential distribution doesn't produce correct result

I am trying to generate a synthetic earthquake database where the number of events ($N$) with magnitude ($M$) in the range $[M, M+\\delta_M]$ follows:我正在尝试生成一个合成地震数据库,其中震级 ($M$) 在 $[M, M+\\delta_M]$ 范围内的事件数 ($N$) 如下:
$\\log_{10}(N) = a - bM$ $\\log_{10}(N) = a - bM$
where $a$ and $b$ are constants.其中 $a$ 和 $b$ 是常量。

I am trying to do this in Python using the random module.我正在尝试使用random模块在 Python 中执行此操作。 I know I can (or at least I think I can - as I haven't tried it) use random.expovariate but I thought I could use random.random with a transformation like:我知道我可以(或者至少我认为我可以 - 因为我还没有尝试过)使用random.expovariate但我认为我可以使用random.random进行如下转换:

-math.log10(random.random()))

I ran this for 2,000,000 samples which I then binned into 0.1 bins and plotted on a log scale.我运行了 2,000,000 个样本,然后将其分箱为 0.1 个分箱并绘制在对数刻度上。

在此处输入图片说明

The red line shows the theoretical distribution used to generate the synthetic samples.红线显示了用于生成合成样本的理论分布。

I'm not worried about the variation above x=4.5.我不担心 x=4.5 以上的变化。 This is due to small number of points and natural randomness.这是由于点数少和自然随机性。 What I am asking about is the very small (at this scale) variation for the points near x=0.我要问的是 x=0 附近点的非常小的(在这个尺度上)变化。 I plotted the variation of the synthetic points from the theoretical (blue dots):我从理论(蓝点)绘制了合成点的变化:

观察到的和理论之间的差异

As x decreases the number of events increase exponentially so the variation from the theoretical should decrease - not increase.随着 x 的减少,事件的数量呈指数增加,因此与理论的变化应该减少 - 而不是增加。 And the point at x=0 is the opposite sense.而 x=0 处的点是相反的意义。

To try and work out where my problem lies I wrote code that generated numbers from 0 to 1 with a very fine step.为了尝试找出我的问题所在,我编写了代码,以非常精细的步骤生成从 0 到 1 的数字。 Each number then went through the function noted above.然后每个数字都通过上面提到的函数。 The result (the blue dots in the above figure) is purely linear that exactly matches the theoretical values.结果(上图中的蓝点)是纯线性的,与理论值完全匹配。 This indicates that my transformation function and code is fine.这说明我的转换函数和代码没问题。

So the only difference between the twp sets of points in the above figure is that the blue ones are generated by 2,000,000 calls to the random function (results are then transformed into magnitudes and binned), while for the red ones I've taken 2,000,000 uniform steps between 0 and 1 (results are then transformed into magnitudes and binned using the same code).因此,上图中 twp 点集之间的唯一区别是蓝色点是通过对随机函数的 2,000,000 次调用生成的(然后将结果转换为幅度并进行分箱),而对于红色点,我采用了 2,000,000 个统一的在 0 和 1 之间的步长(然后将结果转换为幅度并使用相同的代码进行分箱)。

So I'm thinking it's somehow something to do with the random number generator?所以我在想这在某种程度上与随机数生成器有关?

Would be grateful for any pointers.将不胜感激任何指针。 Thanks.谢谢。

[added] Changed the call from random.random to random.uniform(0,1) as suggested by @Arty and the errors are now symmetrically distributed and of the expected magnitude. [添加] 按照random.uniform(0,1)建议将调用从 random.random 更改为random.uniform(0,1) ,并且错误现在对称分布且具有预期的大小。 Have added +- 1 standard deviation to the plot.已将 +- 1 标准偏差添加到绘图中。

与预期的偏差 +- 1 标准偏差

Clearly random() and uniform(0,1) are doing something slightly differently.显然random()uniform(0,1)做的事情略有不同。


I cut down my code and calculated synthetic data using random.random , random.uniform(0,1) , np.random.random and np.random.uniform(0, 1) for 2,000,000 points.我减少了我的代码并使用random.randomrandom.uniform(0,1)np.random.randomnp.random.uniform(0, 1)了 2,000,000 个点的合成数据。

Binned the results and plotted the difference between the observed and expected numbers (below).对结果进行分箱并绘制观察到的和预期的数字之间的差异(如下)。

在此处输入图片说明

Also added in the +-1 standard deviation limits.还添加了 +-1 标准偏差限制。 The numbers are all symmetrically distributed and of the correct magnitude indicating that ALL the random generators are working fine.这些数字都是对称分布的,并且大小正确,表明所有随机生成器都工作正常。

My conclusion is that somewhere along the line of changing/refining code I introduced a problem which has now been lost.我的结论是,在更改/优化代码的过程中,我引入了一个现在已经丢失的问题。 I would dearly like to find that error so I don't make it again!我非常想找到那个错误,这样我就不会再犯了!

I am surprised that my original, incorrect code could perform correctly to the extent that it generated a real looking synthetic with only minor anomalies that were difficult to detect.我很惊讶我原来的、不正确的代码可以正确执行,以至于它生成了一个看起来很真实的合成,只有难以检测的轻微异常。

Thanks for everyone's help and apologies to those I disagreed with that said the problem did not lie with the random number generators!感谢大家的帮助,并向那些我不同意的人表示歉意,他们说问题不在于随机数生成器!

Initially I thought you might be having some numerical analysis problem.最初我以为您可能遇到了一些数值分析问题。 Trying a million samples in python, however, I get the following observed results:然而,在 python 中尝试了一百万个样本,我得到了以下观察结果:

>>> T = int(1e6)
>>> xs = [ -math.log10(random.random()) for i in range(T)]
>>> len([x for x in xs if 0 <= x < 0.1])
205614
>>> len([x for x in xs if 0.1 <= x < 0.2])
163736
>>> len([x for x in xs if 0.2 <= x < 0.3])
129627
>>> len([x for x in xs if 0.3 <= x < 0.4])
103413
>>> len([x for x in xs if 0.4 <= x < 0.5])
81734

If X = -log_10(x) with x uniformly distributed on [0, 1), then we should have如果 X = -log_10(x) 且 x 在 [0, 1) 上均匀分布,那么我们应该有

P(M <= X < M + d) = P(-M-d < log_10(x) <= -M) = 10^(-M) - 10^(-M-d)

and the numbers above are basically perfectly in line with these probabilities, eg并且上面的数字基本上完全符合这些概率,例如

1 - 10^(-0.1) = 0.205672

which matches up nicely with our observed 205614 out of a million trials above.这与我们在上述一百万次试验中观察到的 205614 次非常吻合。

Do you get different results than I do for the python code above?你得到的结果与我对上面的 python 代码的结果不同吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM