从给定的元素列表生成随机的numpy数组，每个元素至少重复一次

Question

I want to create an array (say output_list ) from a given numpy (say input_list ) after resampling such that each element from input_list exists in output_list at least once. 我想创建的阵列（比如output_list从给定numpy的）（比方说input_list ）重新采样，使得从每个元素之后input_list在存在output_list至少一次。 The length of output_list will be always > the length of input_list. output_list的长度将始终> input_list.的长度input_list.

I tried a few approaches, and I am looking for a faster method. 我尝试了几种方法，现在正在寻找一种更快的方法。 Unfortunately, numpy 's random.choice doesn't guarantee that at least one element exists. 不幸的是， numpy的random.choice不能保证至少存在一个元素。

Step 1: Generate Data 步骤1：产生资料

import string
import random
import numpy as np

size = 150000
chars = string.digits + string.ascii_lowercase
input_list= [
            "".join(
                [random.choice(chars) for i in range(5)]
            ) for j in range(dict_data[1]['unique_len'])]

Option 1: Let's try numpy 's random.choice with uniform distribution in terms of probability. 选项1：让我们尝试numpy的random.choice ，其概率分布均匀。

output_list = np.random.choice(
    input_list,
    size=output_size,
    replace=True,
    p=[1/input_list.__len__()]*input_list.__len__()
    )
assert set(input_list).__len__()==set(output_list).__len__(),\
    "Output list has fewer elements than input list"

This raises assertion: 这引起了断言：

Output list has fewer elements than input list 输出列表的元素少于输入列表的元素

Option 2 Let's pad random numbers to input_list and then shuffle it. 选项2让我们将随机数填充到input_list ，然后将其随机播放。

output_list = np.concatenate((np.array(input_list),np.random.choice(
    input_list,
    size=output_size-input_list.__len__(),
    replace=True,
    p=[1/input_list.__len__()]*input_list.__len__()
)),axis=None)

np.random.shuffle(output_list)
assert set(input_list).__len__()==set(output_list).__len__(),\
    "Output list has fewer elements than input list"

While this doesn't raise any assertion, I am looking for a faster solution than this either algorithmically or using numpy 's in-built function. 尽管这不会引起任何断言，但我正在寻找一种比此算法或使用numpy的内置函数更快的解决方案。

Thanks for any help. 谢谢你的帮助。

Answer 1

Let lenI is input list length, lenO is output list length. 令lenI为输入列表长度， lenO为输出列表长度。

1) Make lenO - lenI iterations of uniform random choice from source list 1）从源列表中进行均匀随机选择的lenO - lenI次迭代

2) Then append all input list in the end of output list 2）然后将所有输入列表追加到输出列表的末尾

3) Then make lenI iterations of Fisher–Yates shuffle to distribute last elements uniformly. 3）然后对Fisher-Yates进行lenI次迭代，以均匀分布最后一个元素。

import random
src = [1, 2, 3, 4]
lD = 10
lS = len(src)
dst = []
for _ in range(lD - lS):
    dst.append(src[random.randint(0, lS-1)])
dst.extend(src)
print(dst)
for i in range(lD - 1, lD - lS - 1, -1):
    r = random.randint(0, lD - 1)
    dst[r], dst[i] = dst[i], dst[r]
print(dst)

>>[4, 3, 1, 3, 4, 3, 1, 2, 3, 4]
>>[4, 3, 1, 3, 4, 3, 1, 3, 4, 2]

This is approach with linear complexity. 这是具有线性复杂度的方法。

从给定的元素列表生成随机的numpy数组，每个元素至少重复一次

问题描述

1 个解决方案

解决方案1
0 2018-12-25 06:03:34

从给定的元素列表生成随机的numpy数组，每个元素至少重复一次

问题描述

1 个解决方案

解决方案1 0 2018-12-25 06:03:34

解决方案1
0 2018-12-25 06:03:34