简体   繁体   中英

sort list of floating-point numbers in groups

I have an array of floating-point numbers, which is unordered. I know that the values always fall around a few points, which are not known. For illustration, this list

[10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]

has values clustered around 5 and 10, so I would like [5,10] as answer.

I would like to find those clusters for lists with 1000+ values, where the nunber of clusters is probably around 10 (for some given tolerance). How to do that efficiently?

Check python-cluster . With this library you could do something like this :

from cluster import *

data = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
print [mean(cluster) for cluster in cl.getlevel(1.0)]

And you would get:

[5.0062, 10.003333333333332]

(This is a very silly example, because I don't really know what you want to do, and because this is the first time I've used this library)

You can try the following method:

Sort the array first, and use diff() to calculate the difference between two continuous values. the difference larger than threshold can be consider as the split position:

import numpy as np
x = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
x = np.sort(x)
th = 0.5
print [group.mean() for group in np.split(x, np.where(np.diff(x) > th)[0]+1)]

the result is:

[5.0061999999999998, 10.003333333333332]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM