Can I precompute/pregenerate pseudo-random numbers for numpy?

Question

I don't think I'm the first one to come up with this slightly unorthodox idea, but I can't seem to get google to show my why it is bad, nor how to do it properly.

I have a piece of code that is CPU-bound, and the second most expensive function is np.randint(...) which is called for single numbers (one at a time). I don't have hard requirements for "true" randomness between multiple executions of the program. Hence, I thought it could be smart to precompute/cache a whole bunch (~2 million) of random numbers, save these somewhere, and then have numpy feed me those numbers as required instead of running the rng.

Could somebody please enlighten me as to why this is a bad idea, or how to do it? Thank you!

Answer 1

Generate your numbers, save them to filesystem and load them back. This script could help you?

import pickle
import numpy as np

# Numbers generation
random_numbers = list(np.random.randint(0, 100, 10000))

# Save numbers
with open('filename.pickle', 'wb') as handle:
  pickle.dump(random_numbers, handle, protocol=pickle.HIGHEST_PROTOCOL)

# Load numbers
with open('filename.pickle', 'rb') as handle:
  loaded_random_numbers = pickle.load(handle)

# Check equality
print(random_numbers == loaded_random_numbers)

Answer 2

Whether or not this is a bad idea really depends on your application (with respect to the randomness). However generating two million random numbers isn't really expensive:

numbers = np.random.randint(..., size=2_000_000)

This takes about 40 ms on my machine. Loading from file instead can result in even larger execution times (depending on your file system and how busy it is).

So precomputing all the random numbers and then fetching one at a time seems like a decent improvement (indeed, calling np.random.randint two million times takes about 500 times as long). For example:

numbers = iter(np.random.randint(..., size=2_000_000))
single_number = next(numbers)

If you cannot precompute all numbers (perhaps because your boundaries change dynamically) then you can use random.randint which should be faster. Then saving the random numbers from one run and reusing them for the next might be a thing.

numbers = []
numbers.append(math.randint(...))  # dynamically generate the random numbers

np.save('numbers.npy', numbers)  # eventually save the numbers

Then for the next run, to ensure some level of variation, you can shuffle these numbers after loading them:

numbers = np.load('numbers.npy')
np.random.shuffle(numbers)

Answer 3

I have used this before and works like a charm for order of millions:

First create an array of random numbers one time:

random_arr = np.random.randint(0, size=20000)

And every time you need a random number, simply pick one using this:

number = np.random.choice(random_arr)

It should boost your performance significantly.

Can I precompute/pregenerate pseudo-random numbers for numpy?

Question

3 answers

solution1
2 2020-07-01 15:45:53

solution2
2 2020-07-02 09:56:50

solution3
1 2020-07-01 16:29:34

Can I precompute/pregenerate pseudo-random numbers for numpy?

Question

3 answers

solution1 2 2020-07-01 15:45:53

solution2 2 2020-07-02 09:56:50

solution3 1 2020-07-01 16:29:34

solution1
2 2020-07-01 15:45:53

solution2
2 2020-07-02 09:56:50

solution3
1 2020-07-01 16:29:34