[英]Generate random numpy array from a given list of elements with at least one repetition of each element
I want to create an array (say output_list
) from a given numpy (say input_list
) after resampling such that each element from input_list
exists in output_list
at least once. 我想创建的阵列(比如
output_list
从给定numpy的)(比方说input_list
)重新采样,使得从每个元素之后input_list
在存在output_list
至少一次。 The length of output_list
will be always > the length of input_list.
output_list
的长度将始终> input_list.
的长度input_list.
I tried a few approaches, and I am looking for a faster method. 我尝试了几种方法,现在正在寻找一种更快的方法。 Unfortunately,
numpy
's random.choice
doesn't guarantee that at least one element exists. 不幸的是,
numpy
的random.choice
不能保证至少存在一个元素。
Step 1: Generate Data 步骤1:产生资料
import string
import random
import numpy as np
size = 150000
chars = string.digits + string.ascii_lowercase
input_list= [
"".join(
[random.choice(chars) for i in range(5)]
) for j in range(dict_data[1]['unique_len'])]
Option 1: Let's try numpy
's random.choice
with uniform distribution in terms of probability. 选项1:让我们尝试
numpy
的random.choice
,其概率分布均匀。
output_list = np.random.choice(
input_list,
size=output_size,
replace=True,
p=[1/input_list.__len__()]*input_list.__len__()
)
assert set(input_list).__len__()==set(output_list).__len__(),\
"Output list has fewer elements than input list"
This raises assertion: 这引起了断言:
Output list has fewer elements than input list
输出列表的元素少于输入列表的元素
Option 2 Let's pad random numbers to input_list
and then shuffle it. 选项2让我们将随机数填充到
input_list
,然后将其随机播放。
output_list = np.concatenate((np.array(input_list),np.random.choice(
input_list,
size=output_size-input_list.__len__(),
replace=True,
p=[1/input_list.__len__()]*input_list.__len__()
)),axis=None)
np.random.shuffle(output_list)
assert set(input_list).__len__()==set(output_list).__len__(),\
"Output list has fewer elements than input list"
While this doesn't raise any assertion, I am looking for a faster solution than this either algorithmically or using numpy
's in-built function. 尽管这不会引起任何断言,但我正在寻找一种比此算法或使用
numpy
的内置函数更快的解决方案。
Thanks for any help. 谢谢你的帮助。
Let lenI
is input list length, lenO
is output list length. 令
lenI
为输入列表长度, lenO
为输出列表长度。
1) Make lenO - lenI
iterations of uniform random choice from source list 1)从源列表中进行均匀随机选择的
lenO - lenI
次迭代
2) Then append all input list in the end of output list 2)然后将所有输入列表追加到输出列表的末尾
3) Then make lenI
iterations of Fisher–Yates shuffle to distribute last elements uniformly. 3)然后对Fisher-Yates进行
lenI
次迭代,以均匀分布最后一个元素。
import random
src = [1, 2, 3, 4]
lD = 10
lS = len(src)
dst = []
for _ in range(lD - lS):
dst.append(src[random.randint(0, lS-1)])
dst.extend(src)
print(dst)
for i in range(lD - 1, lD - lS - 1, -1):
r = random.randint(0, lD - 1)
dst[r], dst[i] = dst[i], dst[r]
print(dst)
>>[4, 3, 1, 3, 4, 3, 1, 2, 3, 4]
>>[4, 3, 1, 3, 4, 3, 1, 3, 4, 2]
This is approach with linear complexity. 这是具有线性复杂度的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.