简体   繁体   English

从一组x个项目中,重复每个项目y次,以使y服从正态分布

[英]from a set of x items, repeat each item y times such that, y follows a normal distribution

From a set of x unique items, I need to repeat each item y times such that y follows a normal distribution. 从一组x个唯一项中,我需要重复每个项y次,以使y服从正态分布。

For example, if number of items n = 5, and y_max = 50. If we count how many times each item in my sorted list is repeated, the visual would look like this: 例如,如果项目数n = 5,并且y_max =50。如果我们计算已排序列表中每个项目的重复次数,视觉效果将如下所示:

在此处输入图片说明

my_set=('a','b','c','d','e')
distribution = np.random.normal(len(my_set)/2, 1,len(my_set)).round().astype(int)
np.repeat(my_set, distribution)

I expect the result to follow a trend similar to the graph but instead, the result follows either an increasing or decreasing trend. 我希望结果遵循与图表类似的趋势,但是结果遵循上升或下降趋势。

For readability, I'll use tuples instead of repeating each item y times. 为了便于阅读,我将使用元组而不是将每个项目重复y次。

Expected result should be something like: 预期结果应为:

[('a', 2), ('b', 4), ('c', 5), ('d', 3), ('e', 1)]

Actual result : 实际结果 :

[('a', 5), ('b', 4), ('c', 3), ('d', 4), ('e', 3)]

Firstly, let us generate the desired result. 首先,让我们产生期望的结果。

my_set = ('a', 'b', 'c', 'd', 'e')
distribution = np.random.normal(len(my_set)/2, 1, 10000).round().astype(int)
result = [my_set[max(min(el, 4), 0)] for el in distribution]
np.unique(result, return_counts=True)
>>> (array(['a', 'b', 'c', 'd', 'e'], dtype='<U1'),
>>> array([ 234, 1377, 3421, 3374, 1594]))

Here we generate 10000 random values from given distribution and take corresponding letter instead of each number. 在这里,我们根据给定的分布生成10000个随机值,并采用相应的字母代替每个数字。 So counts represent just what we are looking for: the number of appearances of each letter is normally distibuted. 因此,计数仅代表我们正在寻找的东西:每个字母的出现次数通常是可分配的。

The core problem in your code is in understanding what is distribution or what value is normally distributed. 代码中的核心问题是了解什么是分布或什么值正态分布。 When we call np.random.normal what it does is just generating a variable that is normally distributed. 当我们调用np.random.normal时,它所做的只是生成一个正态分布的变量。 By definition of normal distributed it means that certain number x appears with certain probability p = 根据正态分布的定义,这意味着一定数量的x以一定的概率出现p = pdf正常 . From the point of view of frequencies it mean that if we run generation of variable for many times, fraction p of total number of trials will be x . 从频率的角度来看,这意味着如果我们多次运行变量生成,则试验总数的分数p将为x And that is just what we are looking for. 而这正是我们所寻找的。

In your code what you do is making such variable that numbers of occurences themselves are normally distributed. 在您的代码中,您要做的是使这样的变量:发生次数本身是正态分布的。 It means that each letter will appear n +- s times where s is normally distributed. 这意味着每个字母将出现n +- s次,而s正态分布。 So it is basically normal distribution with normal error. 因此它基本上是正态分布,具有正态误差。 Reading your post thoroughly, I do not think that this is the thing you're looking for. 仔细阅读您的文章,我认为这不是您想要的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中将matplotlib视图设置为与xy平面垂直 - Set matplotlib view to be normal to the x-y plane in Python Python-排序一个列表,使X跟随Y,Y跟随X. - Python-Order a list so that X follows Y and Y follows X x个唯一字符的Python排列每次重复y次 - Python permutations of x unique characters repeated y times each 来自 x 和 y 数据的密度分布和条形图 - density distribution and bar plot from x and y data 从另一个概率分布 P(x) 生成概率分布 P(y),使得 P(x) 中的最高概率在 P(y) 中的可能性最小 - Generating a probability distribution P(y) from another probability distribution P(x) such that highest probability in P(x) is least likely in P(y) 如何使用指示 Python 中 XY 轴频率的分布曲线(例如,正态分布)创建线性回归 plot? - How to create a linear regression plot with distribution curves (e.g., normal distribution) that indicate the frequency at the X-Y axes in Python? Python - 在[x,y]列表中传递“x”“y”次 - Python - print “x” “y” times when passed in a list of [x,y] “x 不在 y”或“不在 y 中” - "x not in y" or "not x in y" 如何从集合中的(x,y)坐标返回x值 - How to return x value from (x,y) coordinates in a set 将X绘制在X上,因为X是y的每个值的时间范围 - Plot Y against X as X is a time range for each value of y
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM