[英]Measuring the distance between points and groups
I am trying to measure the distance between points inside a pandas dataframe. 我正在尝试测量熊猫数据框内各点之间的距离。 I first and looking to measure the distance between points that are in a sub region and get the average distance for that group.
首先,我希望测量子区域中点之间的距离,并获得该组的平均距离。 Then I want to measure the distance between the subregions (measuring the distance between those two vectors).
然后,我要测量子区域之间的距离(测量这两个向量之间的距离)。 I understand how to do the measuring part (using
scipy.spatial.distance.euclidean
for the former and scipy.spatial.distance.cdist
for the latter). 我知道如何做测量部(使用
scipy.spatial.distance.euclidean
对于前者, scipy.spatial.distance.cdist
后者)。 The issue I am running across is figuring out how to apply the functions to the dataset. 我遇到的问题是弄清楚如何将函数应用于数据集。 I think I should use groupby.apply() and feed in my function, but I'm having trouble conceptualizing that.
我认为我应该使用groupby.apply()并输入我的函数,但是在概念化方面遇到了麻烦。 The dataframe looks like this:
数据框如下所示:
id, latitude, longitude, subregion, region
Currently I have: 目前我有:
import pandas as pd
import numpy as np
from scipy.spatial.distance import euclidean
df = pd.read_csv('targets.csv')
...
def calculate_distance(x,y):
return x._get_numeric_data().apply(axis=0, func=euclidean[x,y]).mean()
df.groupby('subregion').apply(calculate_distance)
I know this is incorrect as I want to apply to multiple columns for all the rows. 我知道这是不正确的,因为我想将所有行应用于多个列。 My other thought is that I am using the wrong data structure for this.
我的另一个想法是为此使用了错误的数据结构。
I ended up using a different data structure and in the end looks like this: 我最终使用了不同的数据结构,最终看起来像这样:
contacts = {}
for i, row in sc_walkbook.iterrows():
if contacts.get(row['region'],0) == 0:
contacts[row['region']] = {}
contacts[row['region']][row['subregion']] = {}
contacts[row['region']][row['subregion']]['coords'] = []
contacts[row['region']][row['subregion']]['distances'] = []
elif contacts[row['region']].get(row['subregion'],0) == 0:
contacts[row['region']][row['subregion']] = {}
contacts[row['region']][row['subregion']]['coords'] = []
contacts[row['region']][row['subregion']]['distances'] = []
else:
pass
contacts[row['region']][row['subregion']]['coords'].append([row['T_Latitude'],row['T_Longitude']])
for region in contacts.itervalues():
for subregion in region.itervalues():
for a, b in itertools.combinations(subregion['coords'], 2):
subregion['distances'].append(euclidean(a, b))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.