简体   繁体   中英

Problem with rand%100 for random number generation in C

So I have a homework assignment, and we need to generate random numbers between 1 and 100 in C. I have a working example with int i = rand()%100.

But according to the homework that is technically incorrect which I don't really get. The Homework explanation is as follows

"1.1 We use a random number generator to simulate bus arrival times. ===> the rand( ) function.The rand( ) function returns a pseudo random number 0 to RAND_MAX (2^31-1 in linux).To generate a random number, rn, between 0.0 and 1.0; rn = rand( ) / RAND_MAX.(by the way, a lot of people do below to create, say, 2 digit random numbers. r_num = rand( ) % 100; since % 100 is 0 to 99. However, this is wrong. The right way of generate 2 digit random number is: divide 0-RAND_MAX in 10 intervals and see where the random number falls. The interval time is, it = RAND_MAX / 100. Then, map it to one of 0 - 99 by the following: 0 1 2 3 ......... 99 0 it 2 it 3 it 99 it to RAND_MAX If the rand( ) returns a number is between (12 it) and (13*it), the 2 digit random number is 12.)"

I was hoping someone could take a stab at explaining what it is saying, I'm not really looking for code examples just an understanding of the problem.

There are a couple of problems there, both having to do with how the modulo operator works. a % b effectively gives you the remainder when you divide a by b. So let's suppose that we're computing numbers modulo 4. Let's also assume that RAND_MAX = 6, because I really don't want to have 32768+ rows in my table.

  a | a % 4
------------
  0 | 0
  1 | 1
  2 | 2
  3 | 3
  4 | 0
  5 | 1
  6 | 2

So if you're using your approach to generate random numbers between 1 and 4, you have two problems. First, the simple one: you're generating numbers between 0 and 3, not 1 and 4. The result of the modulo operator will always be between 0 and the modulus.

The other problem is more subtle. If RAND_MAX doesn't divide evenly into the modulus, you won't get the same probability of each number. In the case of our example, there are 2 ways each to make 0 through 2, but only one way to make 3. So 3 will occur ~14.3% of the time, and each other number will occur ~28.6% of the time. To get a uniform distribution, you need to find a way to deal with cases where RAND_MAX doesn't divide evenly.

RAND_MAX is usually 2^31 - 1 so it is equal 2147483647 .

But let's assume for simplicity that we have a very strange system, with RAND_MAX = 100 (so rand() can return 0 to 100 , that's 101 numbers). And let's assume the rand() function has idealuniform distribution .

Now, what is the probability of rand() % 100 ? The numbers 1 to 99 have the same probability, that is 1/101 . But 0 has the probability 2/101 because when rand() return 0 and when rand() return 100 , the expression rand() % 100 will be equal to 0 . So 0 can come more often then any other numbers, actually two times more often. So our distribution of 2-digit numbers with rand() % 100 is not uniform.

Now, the text proposes a solution to the problem. The proposed solution is to split 0 to RAND_MAX region into 100 even parts, so that numbers within each part have the same probability. Then roll rand() and see in which region the number ended. If RAND_MAX is 2147483647 and we for example get a number 279172968 we can see it ends in the 13th region - between RAND_MAX / 100 * 13 = 279172868 and RAND_MAX / 100 * 14 = 300647704 .

The solution is also flawed, as we can see, that it is impossible to divide 0 to RAND_MAX into 100 even parts when RAND_MAX % 100 is not equal to 0 .

I feel the only viable solution is to discard all numbers greater then RAND_MAX / 100 * 100 (using C integer arithmetic). The rest of the numbers will have uniform distribution and the maximum will be divisible by 100, so with the rest we can just rand() % 100 . So something like this:

int get_2_digit_number() {
      int r = 0;
      while (1) {
          r = rand();
          if (r > (RAND_MAX / 100 * 100)) { 
              continue;
          }
          break;
      }
      return r % 100;
}

You can find relevant code on SO. For example, the rand_int() code below is based on code for integers in an answer to Is this C implementation of the Fisher-Yates shuffle correct? (and specifically the answer by Roland Illig ):

static size_t rand_int(size_t n)
{
    size_t limit = RAND_MAX - RAND_MAX % n;
    size_t rnd;

    while ((rnd = rand()) >= limit)
        ;
    return rnd % n;
}

The idea is that you calculate and ignore the large values returned by rand() which would lead to biassed results. When one of the large values is returned, you ignore it and try the next value. This will seldom need more than two calls to rand() .

You might find some of the external references in Shuffle array in C useful too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM