简体繁体 English

什么是Java随机的标准偏差边界？

[英]What would be considered a standard deviation boundry for java random?

原文 2012-10-24 14:53:30 3 3 java/ random/ ab-testing

I'm using java 6 random (java.util.Random,linux 64) to randomly decide between serving one version of a page to a second one (Normal A/B testing), technically i initialize the class once with the default empty constructor and it's injected to a bean (Spring) as a property . 我正在使用Java 6 random（java.util.Random，linux 64）在服务页面的一个版本与第二个页面之间进行随机决定（正常A / B测试），从技术上讲，我使用默认的空构造函数初始化了该类一次并将其作为属性注入到bean（Spring）中。 Most of the times the copies of the pages are within 8%(+-) of each other but from time to time i see deviations of up to 20 percent , eg : 大多数情况下，页面的副本彼此之间的距离在8％（+-）之内，但有时我会看到高达20％的偏差，例如：

I now have two copies that split : 680 / 570 is that considered normal ? 我现在有两个副本可以拆分：680/570被认为是正常的吗？ Is there a better/faster version to use than java random ? 有没有比Java random更好/更快的版本？

Thanks 谢谢

3 个解决方案

A deviation of 20% does seem rather large, but you would need to talk to a trained statistician to find out if it is statistically anomalous. 20％的偏差确实看起来相当大，但是您需要与受过训练的统计学家进行交谈，以了解其是否在统计上异常。

UPDATE - and the answer is that it is not necessarily anomalous. 更新 -答案是它不一定是异常的。 The statistics predict that you would get an outlier like this roughly 0.3% of the time. 统计数据预测，大约0.3％的时间您会得到这样的异常值。

It is certainly plausible for a result like this to be caused by the random number generator. 这样的结果肯定是由随机数生成器引起的。 The Random class uses a simple "linear congruential" algorithm, and this class of algorithms are strongly auto-correlated. Random类使用简单的“线性同余”算法，并且此类算法具有很强的自相关性。 Depending on how you use the random number, this could lead anomalies at the application level. 根据您使用随机数的方式，这可能会导致应用程序级别出现异常。

If this is the cause of your problem, then you could try replacing it with a crypto-strength random number generator. 如果这是造成问题的原因，则可以尝试使用加密强度随机数生成器替换它。 See the javadocs for SecureRandom . 请参阅javadocs中的SecureRandom 。 SecureRandom is more expensive than Random , but it is unlikely that this will make any difference in your use-case. SecureRandom比Random更昂贵，但这不太可能对您的用例产生任何影响。

On the other hand, if these outliers are actually happening at roughly the rate predicted by the theory, changing the random number generator shouldn't make any difference. 另一方面，如果这些异常值实际上以理论所预测的速率发生，则更改随机数生成器不会有任何区别。

If these outliers are really troublesome, then you need to take a different approach. 如果这些异常值确实很麻烦，那么您需要采用其他方法。 Instead of generating N random choices, generate a list of false / true with exactly the required ratio, and then shuffle the list; 无需生成N个随机选择，而是生成具有正确所需比率的false / true列表，然后对列表进行混洗； eg using Collections.shuffle . 例如，使用Collections.shuffle 。

I believe this is fairly normal as it is meant to generate random sequences. 我认为这很正常，因为它意在生成随机序列。 If you want repeated patterns after certain interval, I think you may want to use a specific seed value in the constructor and reset the random with same seed after certain interval. 如果要在一定间隔后重复模式，我想您可能想在构造函数中使用特定的seed值，并在一定间隔后使用相同的种子重置随机数。

eg after every 100/500/n calls to Random.next.. , reset the seed with old value using Random.setSeed(long seed) method. 例如，每Random.next.. / n调用Random.next.. ，请使用Random.setSeed(long seed)方法将种子重置为旧值。

java.util.Random.nextBoolean() is an approach for a standard binomial distribution, which has standard deviation of sqrt(n*p*(1-p)), with p=0.5. java.util.Random.nextBoolean（）是一种用于标准二项式分布的方法，其标准差为sqrt（n * p *（1-p）），p = 0.5。

So if you do 900 iterations, the standard deviation is sqrt(900*.5*.5) = 15, so most times the distribution would be in the range 435 - 465. 因此，如果执行900次迭代，则标准偏差为sqrt（900 * .5 * .5）= 15，因此大多数情况下，分布将在435-465范围内。

However, it is pseudo-random, and has a limited cycle of numbers it will go through before starting over. 但是，它是伪随机的，并且在重新开始之前将经历有限的数字周期。 So if you have enough iterations, the actual deviation will be much smaller than the theoretical one. 因此，如果您有足够的迭代次数，则实际偏差将比理论偏差小得多。 Java uses the formula seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1). Java使用公式种子=（种子* 0x5DEECE66DL + 0xBL）＆（（1L << 48）-1）。 You could write a different formula with smaller numbers to purposely obtain a smaller deviation, which would make it a worse random number generator, but better fitted for your purpose. 您可以编写一个具有较小数字的不同公式，以有目的地获得较小的偏差，这将使它成为较差的随机数生成器，但更适合您的目的。

You could for example create a list of 5 trues and 5 falses in it, and use Collections.shuffle to randomize the list. 例如，您可以在其中创建5个对和5个虚假的列表，然后使用Collections.shuffle将列表随机化。 Then you iterate over them sequentially. 然后您依次遍历它们。 After 10 iterations you re-shuffle the list and start from the beginning. 经过10次迭代后，您可以重新排序列表，并从头开始。 That way you'll never deviate more than 5. 这样一来，您的偏差永远不会超过5。

See http://en.wikipedia.org/wiki/Linear_congruential_generator for the mathematics. 有关数学，请参见http://en.wikipedia.org/wiki/Linear_congruential_generator 。