[英]itertools: Getting combinations of operations ( + - * / ) and columns
[英]Doing simple operations with itertools combinatorics?
我有一个 python 数据集,它具有以下结构:
cluster pts lon lat
0 5 45 24
1 6 47 23
2 10 45 20
如您所见,我有一列是指一个簇,一个簇内的点数,该簇的代表纬度和该簇的代表经度。 在整个 dataframe 中,我有 140 个集群。
现在我想通过组合为每个集群计算以下操作:
𝑤𝑒𝑖𝑔ℎ𝑡(𝑖,𝑗)=−𝑛𝑖+𝑛𝑗/𝑑𝑖𝑠𝑡(𝑖,𝑗)
其中 i 指的是一个簇,j 指的是另一个簇。 其中 n 指的是点数
一方面,它对簇 i 和簇 j 之间的点求和,在分母中,它通过 haversine 计算两个簇之间的距离,同时考虑它们的代表坐标。
我首先想出了一个使用 itertools 的代码,但我有问题要继续。 任何的想法?
from itertools import combinations
for c in combinations(df['cluster'],2):
sum_pts=
distance=
weight=-(sum_pts/distance)
print(c,weight)
正如您提到的,要进行组合,您可以使用 itertools。
要计算距离,您可以使用geopy.distance.distance
。 详情参考文档: https://geopy.readthedocs.io/en/stable/#module-geopy.distance
这应该有效:
from itertools import combinations
from geopy.distance import distance
for p1, p2 in combinations(df['cluster'], 2):
sum_pts = df['pts'][p1] + df['pts'][p2]
# distance in km
dist = distance(df.loc[p1, ['lat', 'lon']], df.loc[p2, ['lat', 'lon']]).km
weight = -sum_pts/dist
print ((p1, p2), weight)
编辑:对于集群不一定对应于索引的情况
for c1, c2 in combinations(df['cluster'], 2):
p1, p2 = df[df['cluster'] == c1].iloc[0], df[df['cluster'] == c2].iloc[0]
sum_pts = p1['pts'] + p2['pts']
dist = distance((p1['lat'], p1['lon']), (p2['lat'], p2['lon'])).km
weight = -sum_pts/dist
print ((c1, c2), weight)
Output:
(0, 1) -0.04733881547464973
(0, 2) -0.033865977446857085
(1, 2) -0.04086856230889897
如果您关心性能,您可能希望使用合并和矢量化操作。
import numpy as np
import pandas as pd
def haversine_distance(lat1, lon1, lat2, lon2):
R = 6372800 # Earth radius in meters
phi1, phi2 = np.radians(lat1), np.radians(lat2)
dphi = np.radians(lat2 - lat1)
dlambda = np.radians(lon2 - lon1)
a = np.sin(dphi / 2) ** 2 + np.cos(phi1) * np.cos(phi2) * np.sin(dlambda / 2) ** 2
return 2 * R * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
df = pd.DataFrame({
'cluster': [0, 1, 2],
'pts': [5, 6, 10],
'lat': [45, 47, 45],
'lon': [24, 23, 20],
})
df = pd.merge(df, df, suffixes=('_1', '_2'), how="cross")
df = df[df['cluster_1'] != df['cluster_2']]
df["weight"] = -df['pts_1'] + df['pts_2'] / haversine_distance(df['lat_1'], df['lon_1'], df['lat_2'], df['lon_2'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.