简体   繁体   English

从给定的元素列表生成随机的numpy数组,每个元素至少重复一次

[英]Generate random numpy array from a given list of elements with at least one repetition of each element

I want to create an array (say output_list ) from a given numpy (say input_list ) after resampling such that each element from input_list exists in output_list at least once. 我想创建的阵列(比如output_list从给定numpy的)(比方说input_list )重新采样,使得从每个元素之后input_list在存在output_list至少一次。 The length of output_list will be always > the length of input_list. output_list的长度将始终> input_list.的长度input_list.

I tried a few approaches, and I am looking for a faster method. 我尝试了几种方法,现在正在寻找一种更快的方法。 Unfortunately, numpy 's random.choice doesn't guarantee that at least one element exists. 不幸的是, numpyrandom.choice不能保证至少存在一个元素。

Step 1: Generate Data 步骤1:产生资料

import string
import random
import numpy as np

size = 150000
chars = string.digits + string.ascii_lowercase
input_list= [
            "".join(
                [random.choice(chars) for i in range(5)]
            ) for j in range(dict_data[1]['unique_len'])]

Option 1: Let's try numpy 's random.choice with uniform distribution in terms of probability. 选项1:让我们尝试numpyrandom.choice ,其概率分布均匀。

output_list = np.random.choice(
    input_list,
    size=output_size,
    replace=True,
    p=[1/input_list.__len__()]*input_list.__len__()
    )
assert set(input_list).__len__()==set(output_list).__len__(),\
    "Output list has fewer elements than input list"

This raises assertion: 这引起了断言:

Output list has fewer elements than input list 输出列表的元素少于输入列表的元素

Option 2 Let's pad random numbers to input_list and then shuffle it. 选项2让我们将随机数填充到input_list ,然后将其随机播放。

output_list = np.concatenate((np.array(input_list),np.random.choice(
    input_list,
    size=output_size-input_list.__len__(),
    replace=True,
    p=[1/input_list.__len__()]*input_list.__len__()
)),axis=None)

np.random.shuffle(output_list)
assert set(input_list).__len__()==set(output_list).__len__(),\
    "Output list has fewer elements than input list"

While this doesn't raise any assertion, I am looking for a faster solution than this either algorithmically or using numpy 's in-built function. 尽管这不会引起任何断言,但我正在寻找一种比此算法或使用numpy的内置函数更快的解决方案。

Thanks for any help. 谢谢你的帮助。

Let lenI is input list length, lenO is output list length. lenI为输入列表长度, lenO为输出列表长度。

1) Make lenO - lenI iterations of uniform random choice from source list 1)从源列表中进行均匀随机选择的lenO - lenI次迭代

2) Then append all input list in the end of output list 2)然后将所有输入列表追加到输出列表的末尾

3) Then make lenI iterations of Fisher–Yates shuffle to distribute last elements uniformly. 3)然后对Fisher-Yates进行lenI次迭代,以均匀分布最后一个元素。

import random
src = [1, 2, 3, 4]
lD = 10
lS = len(src)
dst = []
for _ in range(lD - lS):
    dst.append(src[random.randint(0, lS-1)])
dst.extend(src)
print(dst)
for i in range(lD - 1, lD - lS - 1, -1):
    r = random.randint(0, lD - 1)
    dst[r], dst[i] = dst[i], dst[r]
print(dst)

>>[4, 3, 1, 3, 4, 3, 1, 2, 3, 4]
>>[4, 3, 1, 3, 4, 3, 1, 3, 4, 2]

This is approach with linear complexity. 这是具有线性复杂度的方法。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何随机获取 numpy 数组的一定数量的元素,每个 class 至少有一个元素? - How do I randomly get a certain number of elements of a numpy array with at least one element from each class? 从列表中生成随机元素,每个元素不超过一次重复 - Generate random element from list with no more than one repetition of one element 如何生成一个假设策略来生成一个列表,该列表至少包含它从中采样的每个元素中的一个? - How can I generate a hypothesis strategy to generate a list that contains at least one of each element it samples from? 生成具有numpy的数组排列的长列表(重复) - generate long list (with repetition) of permutations of an array with numpy 使用列表中的随机元素创建numpy数组 - Create numpy array with random elements from list 给定 2 个列表,从两个列表中找到一个随机固定大小的子集,使得每个列表中至少有一个值(最好是统一选择的) - Given 2 lists, find a random fixed-size subset from both lists such that there is at least one value from each list (preferably uniformly selected) 使用来自不同范围的随机数生成 numpy 数组的每一列 - Generate each column of the numpy array with random number from different range 从 numpy 数组的每一列中选择随机元素 - selecting random elements from each column of numpy array 生成N个随机整数的数组,介于1和K之间,但每个数字至少包含一个 - Generate an array of N random integers, between 1 and K, but containing at least one of each number 如何生成一个序列,其中每个元素至少由六个不同的元素与相同的元素分开 - How to generate a sequence where each element is separated from an identical element by at least six different elements
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM