简体   繁体   中英

C rand(). Issue with generating random strings

Encountered issue with generating random strings.

Example below generates repeated blocks of random strings. Amount of random string in block depends on 'WORD_LENGTH'. For 1M 'COUNT' and 'WORD_LENGTH' of 20 chars each block contains 262144 (2^18) random strings and then block repeats.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define WORD_LENGTH 20

//const char charset[62] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
const char charset[16] = "0123456789abcdef";

int main(int argc, char** argv) {
    srand(time(NULL));
    if (argc != 2) {
        printf("Usage: program COUNT'\n\n");
        return 0;
    }
    unsigned int count = atoi(argv[1]);
    char buf[WORD_LENGTH];
    for (int c = 0; c < count; c++ ) {
        for (int i = 0; i < WORD_LENGTH; ++i) {
            buf[i] = charset[ rand() % sizeof charset];
        }
        buf[WORD_LENGTH - 1] = '\0';
        printf("%s\n", buf);
    }
    return 0;
}

Important thing. I could not reproduce "issue" when "..charset[62]..." are used with 'COUNT' up to 100M. Question: Could someone please explain why it works that way?

C uses Pseudorandom number generator in rand() function. Thus they are repeating sequence.

What about try this one:

char buf[WORD_LENGTH];
for (int c = 0; c < count; c++ ) {
    for (int i = 0; i < WORD_LENGTH; ++i) {
        buf[i] = charset[ rand() / (RAND_MAX + 1u) * sizeof charset];
    }
    buf[WORD_LENGTH - 1] = '\0';
    printf("%s\n", buf);
}

It is said in the C reference, codes like rand() % sizeof charset is biased. This answer may give some ideas.

To sum up comments from @WhozCraig and @LightVillet.

Problem discovered in "buf[i] = charset[ rand() % sizeof charset]". Being precise with 'rand() % X' and connected with 'X' and not connected with charset array. Issue reproduced when X = 4,8,16,32,64. But not reproduced with values between. Made short tests with COUNT up to 1M.

Be careful with rand()

and use

"(int)(rand()/(RAND_MAX+1.0) * (sizeof charset))"

Which was mentioned by @Mr.Chip and @rici

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM