简体   繁体   中英

from a set of x items, repeat each item y times such that, y follows a normal distribution

From a set of x unique items, I need to repeat each item y times such that y follows a normal distribution.

For example, if number of items n = 5, and y_max = 50. If we count how many times each item in my sorted list is repeated, the visual would look like this:

在此处输入图片说明

my_set=('a','b','c','d','e')
distribution = np.random.normal(len(my_set)/2, 1,len(my_set)).round().astype(int)
np.repeat(my_set, distribution)

I expect the result to follow a trend similar to the graph but instead, the result follows either an increasing or decreasing trend.

For readability, I'll use tuples instead of repeating each item y times.

Expected result should be something like:

[('a', 2), ('b', 4), ('c', 5), ('d', 3), ('e', 1)]

Actual result :

[('a', 5), ('b', 4), ('c', 3), ('d', 4), ('e', 3)]

Firstly, let us generate the desired result.

my_set = ('a', 'b', 'c', 'd', 'e')
distribution = np.random.normal(len(my_set)/2, 1, 10000).round().astype(int)
result = [my_set[max(min(el, 4), 0)] for el in distribution]
np.unique(result, return_counts=True)
>>> (array(['a', 'b', 'c', 'd', 'e'], dtype='<U1'),
>>> array([ 234, 1377, 3421, 3374, 1594]))

Here we generate 10000 random values from given distribution and take corresponding letter instead of each number. So counts represent just what we are looking for: the number of appearances of each letter is normally distibuted.

The core problem in your code is in understanding what is distribution or what value is normally distributed. When we call np.random.normal what it does is just generating a variable that is normally distributed. By definition of normal distributed it means that certain number x appears with certain probability p = pdf正常 . From the point of view of frequencies it mean that if we run generation of variable for many times, fraction p of total number of trials will be x . And that is just what we are looking for.

In your code what you do is making such variable that numbers of occurences themselves are normally distributed. It means that each letter will appear n +- s times where s is normally distributed. So it is basically normal distribution with normal error. Reading your post thoroughly, I do not think that this is the thing you're looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM