简体   繁体   中英

Using both rand() and rand_r() : is this simple example correct?

I am trying to understand the correct usage of parallel random number generation. After having consulted different resources, I wrote a simple code that seems to work, but it would be nice if someone could confirm my understanding.

For the sake of pointing out the difference and relationship between rand() and rand_r(), let's solve:

Produce a random integer N, then extract N random numbers in parallel and compute their average.

This is my proposal (checking and free omitted), small integers on purpose:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <omp.h>

int main() {
        /* Initialize and extract an integer via rand() */
        srand(time(NULL));
        int N = rand() % 100;

        /* Storage array */ 
        int *extracted = malloc(sizeof(int) * N);

        /* Initialize N seeds for rand_r, which is completely
         * independent on rand and srand().
         * (QUESTION 1: is it right?)
         * Setting the first as time(NULL), and the others
         * via successive increasing is a good idea (? QUESTION 2)*/
        unsigned int *my_seeds = malloc(sizeof(unsigned int) * N);
        my_seeds[0] = time(NULL);
        for (int i = 1; i < N; ++i) {
                my_seeds[i] = my_seeds[i - 1] + 1;
        }

        /* The seeds for rand_r are ready:
         * extract N random numbers in parallel */
        #pragma omp parallel for
        for (int i = 0; i < N; ++i) {
                extracted[i] = rand_r(my_seeds + i) % 10;
        }

        /* Compute the average: must be done sequentially, QUESTION 3,
         * because of time-sincronization in reading/writing avg */
        double avg = 0;
        for (int i = 0; i < N; ++i) {
                avg += extracted[i];
        }
        avg /= N;
        printf("%d samples, %.2f in average.\n", N, avg);
        return 0;
}

As my comments in the code try to highlight, it would be helpful to understand if:

  1. the simultaneous usage of rand and rand_r is in this case correct;

  2. the seed's initialization for rand_r, ie the variable my_seeds, is fine;

  3. the for parallelization and related variable usage is safe.

I hope to sum up various doubts in a single, simple, ready-to-use example, after having read various tutorials / sources online (this website included).

  1. There is nothing incorrect about using both, as long as rand is not called concurrently.

  2. It's unclear what you consider as "fine" or "a good idea". It's fine in the sense that you will get different random number sequences produced for each seed. It's a bit nonsensical in that you only generate a single random number from each seed (which means the generated numbers will all likely follow a very predictable pattern, as do your seeds).

  3. There are no race conditions, so it is safe. Parallelization for < 100 calls of a (presumably) simple arithmetic method is not going to be worth it from a performance perspective, but that's not what you're asking about.

All in all, this code has no formal correctness problems. Whether it fulfills whatever purpose you would like it to fulfill is a different question. Take note that rand (and rand_r ) tend to be only very superficially random 1 , so the predictability mentioned in point 2 is just more of the same. See also Why is rand()%6 biased? for yet another quality-of-randomness issue in the code. In other words, be aware that the randomness you are producing here is lacking for many applications.

1 Assuming that unsigned int has 32 bits, there are only 32 bits of state for the PRNG, so it will repeat after (at most) 2 32 calls anyway (which is trivial to brute-force).

  1. the simultaneous usage of rand and rand_r is in this case correct;

As long as:

  • rand is not used concurrently (which in your code is ok - you're only calling it once in the main thread)
  • rand_r with the same seed variable is not used concurrently (which in your code is ok - you're only calling it once for each seed variable)

there are no issues with thread safety.

  1. the seed's initialization for rand_r, ie the variable my_seeds, is fine;

You have a separate seed for every (potentially) concurrent use of rand_r . As long as the same seed variable isn't used for concurrent calls to rand_r (which in your code doesn't happen), all is good.

  1. the for parallelization and related variable usage is safe.

Each "thread" in your code has its own seed variable for rand_r and its own result variable. So there's no concurrency issue wrt. that.

Side note: rand_r has been obsoleted, and both rand and rand_r are relatively low quality prng's . Depending on your needs, it might be worth it to investigate alternative prng's.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM