简体   繁体   中英

Random number in range with equal probability

This might be more Math related than C#, but I need a C# solution so I'm putting it here.

My question is about the probability of random number generators, more specifically if each possible value is returned with an equal probability.

I know there is the Random.Next(int, int) method which returns a number between the first integer and last (with the last being exclusive).

Random.Next() [without overloads] will return a value between 0 and Int32.MaxValue (which is 2147483647) - 1, so 2147483646.

If I want a value between 1 and 10, I could call Random.Next(1, 11) to do this, however does every value between 1 and 10 have an equal probability of occuring?

For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6 ). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.

How would one ensure that every number within a range has an equal probability of occuring? Let's say for a lottery type system where it would be unfair for some people to have a higher probility than others, I'm not saying I would use the C# built in RNG for this, I was just using it as an example.

I note that no one actually answered the meaty question in your post:

For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.

How would one ensure that every number within a range has an equal probability of occuring?

Right, so you just throw out the values that cause the imbalance. For example, let's say that you had a RNG that could produce a uniform distribution over { 0, 1, 2, 3, 4 } , and you wanted to use it to produce a uniform distribution over { 0, 1 } . The naive implementation is: draw from {0, 1, 2, 3, 4} and then return the value % 2 ; this, however, would obviously produce a biased sample. This happens because, as you note, 5 (the number of items) is not evenly divisible by 2. So, instead, throw any draws that produce the value 4 . Thus, the algorithm would be

 draw from { 0, 1, 2, 3, 4 }
 if the value is 4, throw it out
 otherwise, return the value % 2

You can use this basic idea to solve the general problem.

however does every value between 1 and 10 have an equal probability of occuring?

Yes, it does. From MSDN :

Pseudo-random numbers are chosen with equal probability from a finite set of numbers .

Edit: Apparently the documentation is NOT consistent with the current implementation in .NET. The documentation states the draws are uniform, but the code suggests that it is not. However, that does NOT negate the fact that this is a soluble problem, and my approach is one way to solve it.

The C# built in RNG is, as you expect, a uniformly distributed one. Every number has an equal likelihood of occurring given the range you specify for Next(min, max) .

You can test this yourself (I have) by taking, say, 1M samples and storing how many times each number actually appears. You'll get an almost flat-line curve if you graph it.

Also note that, each number having an equal likelihood doesn't mean that each number will occur the same amount of times. If you're looking at random numbers from 1 to 10, in 100 iterations, it won't be an even distribution of 10x occurrence for each number. Some numbers may occur 8 times, and others 12 or 13 times. However, with more iterations, this tends to even out somewhat.

Also, since it's mentioned in the comments, I'll add: if you want something stronger, look up cryptographic PRNGs. Mersenne Twister is particularly good from what I've seen (fast, cheap to compute, huge period) and it has open-source implementations in C#.

Test program:

var a = new int[10];
var r = new Random();
for (int i = 0; i < 1000000; i++) a[r.Next(1, 11) - 1]++;
for (int i = 0; i < a.Length; i++) Console.WriteLine("{0,2}{1,10}", i + 1, a[i]);

Output:

1      99924
 2     100199
 3     100568
 4     100406
 5     100114
 6      99418
 7      99759
 8      99573
 9     100121
10      99918

Conclusion:

Each value is returned with an equal probability.

Ashes and dtb are incorrect: You are right to suspect that some numbers would have a greater chance of occurring than others.

When you call .Next(x, y) , there are y - x possible return values. The .NET 4.0 Random class calculates a return value based on the return value of NextDouble() (this is a slightly simplified description).

Obviously, the set of possible double values is finite, and, as you note, it may not be a multiple of the size of the set of possible return values of .Next(x, y) . Therefore, assuming that the set of input values is uniformly distributed, some output values will have a slightly greater probability of occurring.

I don't know off hand how many numeric double values there are (ie, excluding infinity and NaN values), but it is certainly larger than 2^32. In your case, if we assume 2^32 values, for the sake of argument, then we have to map 4294967296 inputs to 10 outputs. Some values would have a 429496730 / 429496729 greater probability of occurring, or 0.00000023283064397913028110629 percent greater. In fact, since the number of input states is greater than 2^32, the difference in probability would be even smaller.

I am using C# in Microsoft Visual Studio Community 2017 Version 15.9.6, Microsoft .NET Framework Version 4.7.03056. For an analysis where I directly calculated an answer of 5.6525% (verified by another researcher), I ran a 50-billion round simulation which produced 5.4721%. My simulation required me to generate random integers in the range [1,i] and [i,38] with i=1,2,...,38, so I used Random.Next(1,i+1) and Random.Next(i,39). I spent days hunting for some error in my math to explain the discrepancy, verifying all aspects of my simulation model. Eventually, I questioned whether Random.Next() was truly generating a uniform over the intervals I needed. I swapped in the Mersenne twister for all random-number generation (for example, see: http://www.prowaretech.com/Computer/DotNet/Mersenne ), and the problem vanished. Just a cautionary tale.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM