For a given exclude_list = [3, 5, 8], n = 30, k = 5
I'd like to pick 5(k) random numbers between 1 and 30. But I should not pick numbers in the exclude_list
Suppose exclude_list, n could be potentially large.
When there's no need for exclusion, it is easy to get k random samples
rand_numbers = sample(range(1, n), k)
So to get the answer, I could do
sample(set(range(1, n)) - set(exclude_numbers), k)
I read that range keeps one number in memory at a time. I'm not quite sure how it affects the two lines above.
The first question is, does the following code puts all n numbers in memory or does it put each number at a time?
rand_numbers = sample(range(1, n), k)
2nd question is, if the above code indeed puts one number at a time in memory, can I do the similar with the additional constraint of the exclusion list?
Sample notes in sample
's docstring :
To choose a sample in a range of integers, use range as an argument. This is especially fast and space efficient for sampling from a large population: sample(range(10000000), 60)
I can test this on my machine:
In [11]: sample(range(100000000), 3)
Out[11]: [70147105, 27647494, 41615897]
In [12]: list(range(100000000)) # crash/takes a long time
One way to sample with an exclude list efficiently is to use the same range trick but "hop over" the exclusions (we can do this in O(k * log( len(exclude_list)
)) with the bisect
module :
import bisect
import random
def sample_excluding(n, k, excluding):
# if we assume excluding is unique and sorted we can avoid the set usage...
skips = [j - i for i, j in enumerate(sorted(set(excluding)))]
s = random.sample(range(n - len(skips)), k)
return [i + bisect.bisect_right(skips, i) for i in s]
and we can see it working:
In [21]: sample_excluding(10, 3, [2, 4, 7])
Out[21]: [6, 3, 9]
In [22]: sample_excluding(10, 3, [1, 2, 8])
Out[22]: [0, 4, 3]
In [23]: sample_excluding(10, 6, [1, 2, 8])
Out[23]: [0, 7, 9, 6, 3, 5]
Specifically we've done this without using O(n) memory:
In [24]: sample_excluding(10000000, 6, [1, 2, 8])
Out[24]: [1495143, 270716, 9490477, 2570599, 8450517, 8283229]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.