简体   繁体   English

分组中的浮点数排序列表

[英]sort list of floating-point numbers in groups

I have an array of floating-point numbers, which is unordered.我有一个浮点数数组,它是无序的。 I know that the values always fall around a few points, which are not known.我知道这些值总是落在几个点左右,这是未知的。 For illustration, this list为了说明,这个列表

[10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]

has values clustered around 5 and 10, so I would like [5,10] as answer.值聚集在 5 和 10 左右,所以我希望 [5,10] 作为答案。

I would like to find those clusters for lists with 1000+ values, where the nunber of clusters is probably around 10 (for some given tolerance).我想为具有 1000 多个值的列表找到这些集群,其中集群的数量可能约为 10(对于某些给定的容差)。 How to do that efficiently?如何有效地做到这一点?

Check python-cluster .检查python-cluster With this library you could do something like this :有了这个库,你可以做这样的事情:

from cluster import *

data = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
print [mean(cluster) for cluster in cl.getlevel(1.0)]

And you would get:你会得到:

[5.0062, 10.003333333333332]

(This is a very silly example, because I don't really know what you want to do, and because this is the first time I've used this library) (这是一个非常愚蠢的例子,因为我真的不知道你想做什么,而且因为这是我第一次使用这个库)

You can try the following method:您可以尝试以下方法:

Sort the array first, and use diff() to calculate the difference between two continuous values.先对数组进行排序,然后使用 diff() 计算两个连续值之间的差值。 the difference larger than threshold can be consider as the split position:大于阈值的差异可以认为是分裂位置:

import numpy as np
x = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
x = np.sort(x)
th = 0.5
print [group.mean() for group in np.split(x, np.where(np.diff(x) > th)[0]+1)]

the result is:结果是:

[5.0061999999999998, 10.003333333333332]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM