So I am building a list in Python, for example, let us say the first 100 integers, but I do need all the 100 integers but only a sample lets say 3.
import random
def f():
list_ = []
for i in range(100):
list_.append(i)
return list_
def g(list_,k):
return random.sample(list_, k)
print(g(f(),3))
>>>[50, 92, 6]
Now can I get away with not building the whole list in the first place, but directly build the sample, maybe by adding a probability with which elements get added to the list in f()
Because if I am building a huge list which does not integers numbers but some other objects, this approach could be costly, in terms of memory and computation.
def random_no_dups_k_of_n(k, n):
res = list(range(k))
for i in range(k, n):
v = random.randint(0, i) # this is 0-i inclusive
if v == i:
ir = random.randint(0,k-1)
res[ir] = i
return res
What's happening here: it's a telescoping product. Each element from 0
to k-1
starts out having a k/k
chance of being selected. After 1st iteration k
has 1/(k+1)
chance of getting selected, while all others (not just remaining, but all) have a (k-1)/k * k/(k+1) = (k-1)/(k+1)
chance of getting selected. After 2nd iteration, k+1
has a 1/(k+2)
chance of getting selected, while all the others have a (k-1)/(k+1) * (k+1)/(k+2) = (k-1)/(k+2)
chance of getting selected. And so on. In the end, each number will have a k/n
chance of getting selected.
Actually, I just saw that you can just do random.sample(range(n), k)
. I just assumed it wasn't available.
EDIT : I got the probabilities reversed above. The correct version should be:
def random_no_dups_k_of_n(k, n):
res = list(range(k))
for i in range(k, n):
v = random.randint(0, i) # this is 0-i inclusive
if v < k:
ir = random.randint(0,k-1)
res[ir] = i
return res
Each element from 0
to k-1
starts out having a k/k
chance of being selected. After 1st iteration k
has k/(k+1)
chance of getting selected, while all others (not just remaining, but all) have a k/k*((k-1)/k * k/(k+1) + 1(k+1) = k/(k+1)
chance of getting selected. After 2nd iteration, k+1
has a k/(k+2)
chance of getting selected, while all the others have a k/(k+1)*((k-1)/k * k/(k+2) + 2/(k+2))= k/(k+2)
chance of getting selected.
And this actually does collapse all the calculations to give each element a k/(k+m)
chance after m
th step.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.