简体   繁体   English

通过欧几里得距离从 numpy 数组的平均值中选择最接近的值

[英]Selecting closest values by Euclidian distance from the mean from a numpy array

I'm sure there's a straightforward answer to this, but I'm very much a Python novice and trawling stackoverflow is getting me tantalisingly close but falling at the final hurdle, so apologies.我敢肯定有一个简单的答案,但我是一个非常 Python 的新手,并且拖网 stackoverflow 让我非常接近但落在了最后的障碍上,所以很抱歉。 I have an array of one dimensional arrays (in reality composed of >2000 arrays, each of ~800 values), but for representation sake:我有一个一维 arrays 数组(实际上由 >2000 个 arrays 组成,每个值约为 800 个),但为了表示:

group = [[0,1,3,4,5],[0,2,3,6,7],[0,4,3,2,5],...]

I'm trying to select the nearest n 1-d arrays to the mean (by Euclidian distance), but struggling to extract them from the original list.我正在尝试 select 最接近平均值的n 1-d arrays (通过欧几里得距离),但努力从原始列表中提取它们。 I can figure out the distances and sort them, but can't then extract them from the original group.我可以计算出距离并对它们进行排序,但不能从原始组中提取它们。

# Compute the mean
group_mean = group.mean(axis = 0)
     
distances = []
for x in group:
    # Compute Euclidian distance from the mean
    distances.append(np.linalg.norm(x - group_mean))
    # Sort distances
    distances.sort()

print(distances[0:5]) # Prints the five nearest distances

Any advice as to how to select out the five (or whatever) arrays from group corresponding to the nearest distances would be much appreciated.关于如何从对应于最近距离的group中的五个(或其他)arrays 中的 select 的任何建议将不胜感激。

you can put the array in with the dist array, and sort based on the distance to the mean:您可以将数组放入 dist 数组中,并根据与平均值的距离进行排序:

import numpy as np 
group = np.array([[0,1,3,4,5],[0,2,3,6,7],[0,4,3,2,5]])
group_mean = group.mean(axis = 0)

distances = [[np.linalg.norm(x - group_mean),x] for x in group]
distances.sort(key=lambda a : a[0])

print(distances[0:5]) # Prints the five nearest distances

If your arrays get larger, it might be wise to only save the index instead of the whole array:如果您的 arrays 变大,最好只保存索引而不是整个数组:

distances = [[np.linalg.norm(x - group_mean),i] for i,x in enumerate(group)]

If you don't want to save the distances themself, but just want to sort based on the distance, you can do this:如果您不想自己保存距离,而只想根据距离进行排序,您可以这样做:

group = list(group)
group.sort(key=lambda group: np.linalg.norm(group - np.mean(group)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM