从一组x个项目中，重复每个项目y次，以使y服从正态分布

Question

From a set of x unique items, I need to repeat each item y times such that y follows a normal distribution. 从一组x个唯一项中，我需要重复每个项y次，以使y服从正态分布。

For example, if number of items n = 5, and y_max = 50. If we count how many times each item in my sorted list is repeated, the visual would look like this: 例如，如果项目数n = 5，并且y_max =50。如果我们计算已排序列表中每个项目的重复次数，视觉效果将如下所示：

my_set=('a','b','c','d','e')
distribution = np.random.normal(len(my_set)/2, 1,len(my_set)).round().astype(int)
np.repeat(my_set, distribution)

I expect the result to follow a trend similar to the graph but instead, the result follows either an increasing or decreasing trend. 我希望结果遵循与图表类似的趋势，但是结果遵循上升或下降趋势。

For readability, I'll use tuples instead of repeating each item y times. 为了便于阅读，我将使用元组而不是将每个项目重复y次。

Expected result should be something like: 预期结果应为：

[('a', 2), ('b', 4), ('c', 5), ('d', 3), ('e', 1)]

Actual result : 实际结果：

[('a', 5), ('b', 4), ('c', 3), ('d', 4), ('e', 3)]

Answer 1

Firstly, let us generate the desired result. 首先，让我们产生期望的结果。

my_set = ('a', 'b', 'c', 'd', 'e')
distribution = np.random.normal(len(my_set)/2, 1, 10000).round().astype(int)
result = [my_set[max(min(el, 4), 0)] for el in distribution]
np.unique(result, return_counts=True)
>>> (array(['a', 'b', 'c', 'd', 'e'], dtype='<U1'),
>>> array([ 234, 1377, 3421, 3374, 1594]))

Here we generate 10000 random values from given distribution and take corresponding letter instead of each number. 在这里，我们根据给定的分布生成10000个随机值，并采用相应的字母代替每个数字。 So counts represent just what we are looking for: the number of appearances of each letter is normally distibuted. 因此，计数仅代表我们正在寻找的东西：每个字母的出现次数通常是可分配的。

The core problem in your code is in understanding what is distribution or what value is normally distributed. 代码中的核心问题是了解什么是分布或什么值正态分布。 When we call np.random.normal what it does is just generating a variable that is normally distributed. 当我们调用np.random.normal时，它所做的只是生成一个正态分布的变量。 By definition of normal distributed it means that certain number x appears with certain probability p = 根据正态分布的定义，这意味着一定数量的x以一定的概率出现p = . 。 From the point of view of frequencies it mean that if we run generation of variable for many times, fraction p of total number of trials will be x . 从频率的角度来看，这意味着如果我们多次运行变量生成，则试验总数的分数p将为x 。 And that is just what we are looking for. 而这正是我们所寻找的。

In your code what you do is making such variable that numbers of occurences themselves are normally distributed. 在您的代码中，您要做的是使这样的变量：发生次数本身是正态分布的。 It means that each letter will appear n +- s times where s is normally distributed. 这意味着每个字母将出现n +- s次，而s正态分布。 So it is basically normal distribution with normal error. 因此它基本上是正态分布，具有正态误差。 Reading your post thoroughly, I do not think that this is the thing you're looking for. 仔细阅读您的文章，我认为这不是您想要的东西。

从一组x个项目中，重复每个项目y次，以使y服从正态分布

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-04 07:43:44

从一组x个项目中，重复每个项目y次，以使y服从正态分布

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-04 07:43:44

解决方案1
1 已采纳 2019-02-04 07:43:44