[英]latitude and longitude clustering in python
我正在使用具有经纬度和经度数据的数据框,我需要将彼此最近的点聚类(200米)。 这就是我在Python中所做的。
order_lat order_long
0 19.111841 72.910729
1 19.111342 72.908387
2 19.111342 72.908387
3 19.137815 72.914085
4 19.119677 72.905081
5 19.119677 72.905081
6 19.119677 72.905081
7 19.120217 72.907121
8 19.120217 72.907121
9 19.119677 72.905081
10 19.119677 72.905081
11 19.119677 72.905081
12 19.111860 72.911346
13 19.111860 72.911346
14 19.119677 72.905081
15 19.119677 72.905081
16 19.119677 72.905081
17 19.137815 72.914085
18 19.115380 72.909144
19 19.115380 72.909144
20 19.116168 72.909573
21 19.119677 72.905081
22 19.137815 72.914085
23 19.137815 72.914085
24 19.112955 72.910102
25 19.112955 72.910102
26 19.112955 72.910102
27 19.119677 72.905081
28 19.119677 72.905081
29 19.115380 72.909144
30 19.119677 72.905081
31 19.119677 72.905081
32 19.119677 72.905081
33 19.119677 72.905081
34 19.119677 72.905081
35 19.111860 72.911346
36 19.111841 72.910729
37 19.131674 72.918510
38 19.119677 72.905081
39 19.111860 72.911346
40 19.111860 72.911346
41 19.111841 72.910729
42 19.111841 72.910729
43 19.111841 72.910729
44 19.115380 72.909144
45 19.116625 72.909185
46 19.115671 72.908985
47 19.119677 72.905081
48 19.119677 72.905081
49 19.119677 72.905081
50 19.116183 72.909646
51 19.113827 72.893833
52 19.119677 72.905081
53 19.114100 72.894985
54 19.107491 72.901760
55 19.119677 72.905081
然后,我在数据帧中找到了每对经纬度与每对经纬度之间的距离。
lat_array = np.radians(np.array(order_data['order_lat']))
long_array = np.radians(np.array(order_data['order_long']))
distance = []
pairs_lat1 = []
pairs_long1 = []
pairs_lat2 = []
pairs_long2 = []
for i in range(len(lat_array)):
for j in range(i+1,len(lat_array)):
dlon = long_array[j]-long_array[i]
dlat = lat_array[j]-lat_array[i]
a = np.sin(dlat / 2)**2 + np.cos(lat_array[i]) * np.cos(lat_array[j])
* np.sin(dlon / 2)**2
c = 2 * 6371 * np.arcsin(np.sqrt(a))
pairs_lat1.append(lat_array[i])
pairs_long1.append(long_array[i])
pairs_lat2.append(lat_array[j])
pairs_long2.append(long_array[j])
distance.append(c)
df_distance = pd.DataFrame()
df_distance['lat1'] = np.rad2deg(pairs_lat1)
df_distance['long1'] = np.rad2deg(pairs_long1)
df_distance['lat2'] = np.rad2deg(pairs_lat2)
df_distance['long2'] = np.rad2deg(pairs_long2)
df_distance['distance'] = distance
df_distance.head()
lat1 long1 lat2 long2 distance
0 19.111841 72.910729 19.111342 72.908387 2.522482e-01
1 19.111841 72.910729 19.111342 72.908387 2.522482e-01
2 19.111841 72.910729 19.137815 72.914085 2.909520e+00
3 19.111841 72.910729 19.119677 72.905081 1.054209e+00
4 19.111841 72.910729 19.119677 72.905081 1.054209e+00
5 19.111841 72.910729 19.119677 72.905081 1.054209e+00
这给我两对之间的距离(lat1,long1&lat2,long2)252米如何对点进行聚类? 所以说最近的点在一起。让我们说半径在250米以内。 我可以使用分层聚类吗?
最简单的方法是建立一个包含任意两点之间距离的距离矩阵,然后使用任何经典的聚类算法。 Scikit-learn是最流行的集群库之一(在许多其他事物中)。 您还可以尝试专门为地理空间集群设计的GVM 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.