I want to create an array (say output_list
) from a given numpy (say input_list
) after resampling such that each element from input_list
exists in output_list
at least once. The length of output_list
will be always > the length of input_list.
I tried a few approaches, and I am looking for a faster method. Unfortunately, numpy
's random.choice
doesn't guarantee that at least one element exists.
Step 1: Generate Data
import string
import random
import numpy as np
size = 150000
chars = string.digits + string.ascii_lowercase
input_list= [
"".join(
[random.choice(chars) for i in range(5)]
) for j in range(dict_data[1]['unique_len'])]
Option 1: Let's try numpy
's random.choice
with uniform distribution in terms of probability.
output_list = np.random.choice(
input_list,
size=output_size,
replace=True,
p=[1/input_list.__len__()]*input_list.__len__()
)
assert set(input_list).__len__()==set(output_list).__len__(),\
"Output list has fewer elements than input list"
This raises assertion:
Output list has fewer elements than input list
Option 2 Let's pad random numbers to input_list
and then shuffle it.
output_list = np.concatenate((np.array(input_list),np.random.choice(
input_list,
size=output_size-input_list.__len__(),
replace=True,
p=[1/input_list.__len__()]*input_list.__len__()
)),axis=None)
np.random.shuffle(output_list)
assert set(input_list).__len__()==set(output_list).__len__(),\
"Output list has fewer elements than input list"
While this doesn't raise any assertion, I am looking for a faster solution than this either algorithmically or using numpy
's in-built function.
Thanks for any help.
Let lenI
is input list length, lenO
is output list length.
1) Make lenO - lenI
iterations of uniform random choice from source list
2) Then append all input list in the end of output list
3) Then make lenI
iterations of Fisher–Yates shuffle to distribute last elements uniformly.
import random
src = [1, 2, 3, 4]
lD = 10
lS = len(src)
dst = []
for _ in range(lD - lS):
dst.append(src[random.randint(0, lS-1)])
dst.extend(src)
print(dst)
for i in range(lD - 1, lD - lS - 1, -1):
r = random.randint(0, lD - 1)
dst[r], dst[i] = dst[i], dst[r]
print(dst)
>>[4, 3, 1, 3, 4, 3, 1, 2, 3, 4]
>>[4, 3, 1, 3, 4, 3, 1, 3, 4, 2]
This is approach with linear complexity.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.