![](/img/trans.png)
[英]What is the fastest way to generate a matrix for distances between location with lat and lon?
[英]Matrix of distances from csv file of lat and lon
- Read the CSV input file for hotel name, lat and lon,
placing them in a table of three columns.
- LOOP A over the hotels
- LOOP B over the hotels, starting with next hotel after A
- Calculate D distance between A and B
- Store D in matrix at column A and row B
- Store D in matrix at column B and row A
如果酒店分散在很廣的區域,您將需要使用 Haversine 公式來計算准確的距離。
根據@ravenspoint 的回答,這里有一個簡單的代碼來計算距離。
>>> import numpy as np
>>> import pandas as pd
>>> import geopy.distance
>>> data = {"hotels": ["1", "2", "3", "4"], "lat": [20.55697, 21.123698, 25.35487, 19.12577], "long": [17.1, 18.45893, 16.78214, 14.75498]}
>>> df = pd.DataFrame(data)
>>> df
hotels lat long
1 20.556970 17.10000
2 21.123698 18.45893
3 25.354870 16.78214
4 19.125770 14.75498
現在讓我們創建一個矩陣到 map 酒店之間的距離。 矩陣應該有大小(酒店的 nbr x 酒店的 nbr)。
>>> matrix = np.ones((len(df), len(df))) * -1
>>> np.fill_diagonal(matrix, 0)
>>> matrix
array([[ 0., -1., -1., -1.],
[-1., 0., -1., -1.],
[-1., -1., 0., -1.],
[-1., -1., -1., 0.]])
所以這里的-1是為了避免兩次dist(1,2) = dist(2,1)計算相同的距離。
接下來,只需遍歷酒店並計算距離。 這里使用geopy package。
>>> for i in range(len(df)):
coords_i = df.loc[i, ["lat", "long"]].values
for j in range(i+1, len(df)):
coords_j = df.loc[j, ["lat", "long"]].values
matrix[i,j] = geopy.distance.geodesic(coords_i, coords_j).km
>>> matrix
array([[ 0. , 154.73003254, 532.33605633, 292.29813424],
[ -1. , 0. , 499.00500751, 445.97821702],
[ -1. , -1. , 0. , 720.69054683],
[ -1. , -1. , -1. , 0. ]])
請注意,嵌套循環並不是完成這項工作的最佳方式,並且可以增強代碼。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.