简体   繁体   English

Python:对组中心值 n SD 内的数字进行分组

[英]Python: Grouping numbers that are within n SD of central value for the group

I have a list of multiple floats that looks something like this我有一个看起来像这样的多个浮动列表

mylist = [10, 10.2, 10.5, 11, 15, 15.3, 15.4, 16, 27, 27.4, 28, 28.1, 28.2]

I want to group the values that are close to each other.我想对彼此接近的值进行分组。 For eg.例如。 I want to group values from 10 to 11 into the average of the 4 values.我想将 10 到 11 的值分组为 4 个值的平均值。 I am having a hard time identifying the central values and then selecting values to left and right that would fall in the group.我很难确定中心值,然后选择属于该组的左右值。 How could I do this?我怎么能这样做?

How about this using defaultdict :使用defaultdict怎么样:

In [1]: from collections import defaultdict

In [2]: group = defaultdict(list)

In [3]: mylist = [10, 10.2, 10.5, 11, 15, 15.3, 15.4, 16, 27, 27.4, 28, 28.1, 28
   ...: .2]

In [4]: for val in mylist:
   ...:     group[int(val)].append(val)
   ...:     

In [5]: group
Out[5]: 
defaultdict(list,
            {10: [10, 10.2, 10.5],
             11: [11],
             15: [15, 15.3, 15.4],
             16: [16],
             27: [27, 27.4],
             28: [28, 28.1, 28.2]})

It does not need sorted input.它不需要排序输入。 Also, it preserves the order of related values此外,它保留了相关值的顺序

assuming, I correctly understand your requirement.假设,我正确理解您的要求。

I sounds like you want a general method, probably something like:我听起来你想要一个通用的方法,可能是这样的:

from scipy.stats import binned_statistic

data = [10, 10.2, 10.5, 11, 15, 15.3, 15.4, 16, 27, 27.4, 28, 28.1, 28.2]
stats, edges, binarray = binned_statistic(data,data,bins=4)

edges    # Is the boundary values that split the data evenly into 4 bins. 
binarray # Shows which numbers in your original array belong to which equal sized bin. 
         # Note that nothing belongs to bin-3 because the gap is too wide. 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM