分组中的浮点数排序列表

Question

I have an array of floating-point numbers, which is unordered.我有一个浮点数数组，它是无序的。 I know that the values always fall around a few points, which are not known.我知道这些值总是落在几个点左右，这是未知的。 For illustration, this list为了说明，这个列表

[10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]

has values clustered around 5 and 10, so I would like [5,10] as answer.值聚集在 5 和 10 左右，所以我希望 [5,10] 作为答案。

I would like to find those clusters for lists with 1000+ values, where the nunber of clusters is probably around 10 (for some given tolerance).我想为具有 1000 多个值的列表找到这些集群，其中集群的数量可能约为 10（对于某些给定的容差）。 How to do that efficiently?如何有效地做到这一点？

Answer 1

Check python-cluster .检查python-cluster 。 With this library you could do something like this :有了这个库，你可以做这样的事情：

from cluster import *

data = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
print [mean(cluster) for cluster in cl.getlevel(1.0)]

And you would get:你会得到：

[5.0062, 10.003333333333332]

(This is a very silly example, because I don't really know what you want to do, and because this is the first time I've used this library) （这是一个非常愚蠢的例子，因为我真的不知道你想做什么，而且因为这是我第一次使用这个库）

Answer 2

You can try the following method:您可以尝试以下方法：

Sort the array first, and use diff() to calculate the difference between two continuous values.先对数组进行排序，然后使用 diff() 计算两个连续值之间的差值。 the difference larger than threshold can be consider as the split position:大于阈值的差异可以认为是分裂位置：

import numpy as np
x = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
x = np.sort(x)
th = 0.5
print [group.mean() for group in np.split(x, np.where(np.diff(x) > th)[0]+1)]

the result is:结果是：

[5.0061999999999998, 10.003333333333332]

分组中的浮点数排序列表

问题描述

2 个解决方案

解决方案1
16 已采纳 2011-11-22 12:38:17

解决方案2
7 2011-11-23 03:18:44

分组中的浮点数排序列表

问题描述

2 个解决方案

解决方案1 16 已采纳 2011-11-22 12:38:17

解决方案2 7 2011-11-23 03:18:44

解决方案1
16 已采纳 2011-11-22 12:38:17

解决方案2
7 2011-11-23 03:18:44