简体   繁体   English

Python 位置,显示与最近其他位置的距离

[英]Python location, show distance from closest other location

I am a location in a dataframe, underneath lat lon column names.我是 dataframe 中的一个位置,位于 lat lon 列名下方。 I want to show how far that is from the lat lon of the nearest train station in a separate dataframe.我想在单独的 dataframe 中显示距离最近火车站的纬度有多远。

So for example, I have a lat lon of (37.814563 144.970267), and i have a list as below of other geospatial points.因此,例如,我的纬度为 (37.814563 144.970267),并且我有如下其他地理空间点的列表。 I want to find the point that is closest and then find the distance between those points, as an extra column in the dataframe in suburbs.我想找到最近的点,然后找到这些点之间的距离,作为郊区 dataframe 中的额外列。

This is the example of the train dataset这是火车数据集的示例

<bound method NDFrame.to_clipboard of   STOP_ID                                          STOP_NAME   LATITUDE  \
0   19970             Royal Park Railway Station (Parkville) -37.781193   
1   19971  Flemington Bridge Railway Station (North Melbo... -37.788140   
2   19972         Macaulay Railway Station (North Melbourne) -37.794267   
3   19973   North Melbourne Railway Station (West Melbourne) -37.807419   
4   19974        Clifton Hill Railway Station (Clifton Hill) -37.788657   

    LONGITUDE TICKETZONE                                          ROUTEUSSP  \
0  144.952301          1                                            Upfield   
1  144.939323          1                                            Upfield   
2  144.936166          1                                            Upfield   
3  144.942570          1  Flemington,Sunbury,Upfield,Werribee,Williamsto...   
4  144.995417          1                                 Mernda,Hurstbridge   

                      geometry  
0  POINT (144.95230 -37.78119)  
1  POINT (144.93932 -37.78814)  
2  POINT (144.93617 -37.79427)  
3  POINT (144.94257 -37.80742)  
4  POINT (144.99542 -37.78866)  >

and this is an example of the suburbs这是郊区的一个例子

<bound method NDFrame.to_clipboard of       postcode              suburb state        lat         lon
4901      3000           MELBOURNE   VIC -37.814563  144.970267
4902      3002      EAST MELBOURNE   VIC -37.816640  144.987811
4903      3003      WEST MELBOURNE   VIC -37.806255  144.941123
4904      3005  WORLD TRADE CENTRE   VIC -37.822262  144.954856
4905      3006           SOUTHBANK   VIC -37.823258  144.965926>

Which I am trying to show, is the distance from the lat lon to the closet train station in a new column for the suburb list.我要显示的是从纬度到最近的火车站的距离,在郊区列表的新列中。

Using a solution get a weird output, wondering if it's correct?使用解决方案得到一个奇怪的 output,想知道它是否正确?

With both solutions shown,显示两种解决方案,

from sklearn.neighbors import NearestNeighbors
from haversine import haversine

NN = NearestNeighbors(n_neighbors=1, metric='haversine')
NN.fit(trains_shape[['LATITUDE', 'LONGITUDE']])

indices = NN.kneighbors(df_complete[['lat', 'lon']])[1]
indices = [index[0] for index in indices]
distances = NN.kneighbors(df_complete[['lat', 'lon']])[0]
df_complete['closest_station'] = trains_shape.iloc[indices]['STOP_NAME'].reset_index(drop=True)
df_complete['closest_station_distances'] = distances
print(df_complete)

The output here, output在这里,

<bound method NDFrame.to_clipboard of    postcode        suburb state        lat         lon  Venues Cluster  \
1      3040    aberfeldie   VIC -37.756690  144.896259             4.0   
2      3042  airport west   VIC -37.711698  144.887037             1.0   
4      3206   albert park   VIC -37.840705  144.955710             0.0   
5      3020        albion   VIC -37.775954  144.819395             2.0   
6      3078    alphington   VIC -37.780767  145.031160             4.0   

                     #1                    #2             #3  \
1                  Café     Electronics Store  Grocery Store   
2  Fast Food Restaurant                  Café    Supermarket   
4                  Café                   Pub    Coffee Shop   
5                  Café  Fast Food Restaurant  Grocery Store   
6                  Café                  Park            Bar   

                      #4  ...                             #6  \
1            Coffee Shop  ...                         Bakery   
2          Grocery Store  ...             Italian Restaurant   
4         Breakfast Spot  ...                   Burger Joint   
5  Vietnamese Restaurant  ...                            Pub   
6            Pizza Place  ...  Vegetarian / Vegan Restaurant   

                      #7                   #8                         #9  \
1          Shopping Mall  Japanese Restaurant          Indian Restaurant   
2  Portuguese Restaurant    Electronics Store  Middle Eastern Restaurant   
4                    Bar               Bakery                  Gastropub   
5     Chinese Restaurant                  Gym                     Bakery   
6     Italian Restaurant            Gastropub                     Bakery   

                 #10 Ancestry Cluster  ClosestStopId  \
1   Greek Restaurant              8.0          20037   
2  Convenience Store              5.0          20032   
4              Beach              6.0          22180   
5  Convenience Store              5.0          20004   
6        Coffee Shop              5.0          19931   

                                   ClosestStopName  \
1              Essendon Railway Station (Essendon)   
2                Glenroy Railway Station (Glenroy)   
4  Southern Cross Railway Station (Melbourne City)   
5          Albion Railway Station (Sunshine North)   
6          Alphington Railway Station (Alphington)   

                                   closest_station closest_station_distances  
1                Glenroy Railway Station (Glenroy)                  0.019918  
2  Southern Cross Railway Station (Melbourne City)                  0.031020  
4          Alphington Railway Station (Alphington)                  0.023165  
5                  Altona Railway Station (Altona)                  0.005559  
6                Newport Railway Station (Newport)                  0.002375  

And the second function.和第二个 function。

def ClosestStop(r):
    # Cartesin Distance: square root of (x2-x2)^2 + (y2-y1)^2
    distances = ((r['lat']-StationDf['LATITUDE'])**2 + (r['lon']-StationDf['LONGITUDE'])**2)**0.5
    
    # Stop with minimum Distance from the Suburb
    closestStationId = distances[distances == distances.min()].index.to_list()[0]
    return StationDf.loc[closestStationId, ['STOP_ID', 'STOP_NAME']]

df_complete[['ClosestStopId', 'ClosestStopName']] = df_complete.apply(ClosestStop, axis=1)

This is giving different answers oddly enough, and leads me to think that there is an issue with this code.奇怪的是,这给出了不同的答案,并让我认为这段代码存在问题。 the KM's seem wrong as well. KM 似乎也是错误的。

Completely unsure how to approach this problem - would love some guidance here, thanks!完全不确定如何解决这个问题 - 会喜欢这里的一些指导,谢谢!

A few key concepts几个关键概念

  1. do a Cartesian product between two data frames to get all combinations (joining on identical value between two data frames is approach to this foo=1 )在两个数据帧之间做一个笛卡尔积以获得所有组合(在两个数据帧之间加入相同的值是接近这个foo=1的方法)
  2. once both sets of data is together, have both sets of lat/lon to calculate distance) geopy has been used for this一旦两组数据放在一起,让两组纬度/经度来计算距离) geopy已用于此
  3. cleanup the columns, use sort_values() to find smallest distance清理列,使用sort_values()找到最小距离
  4. finally a groupby() and agg() to get first values for shortest distance最后是groupby()agg()来获得最短距离的第一个

There are two data frames for use有两个数据框可供使用

  1. dfdist contains all the combinations and distances dfdist包含所有组合和距离
  2. dfnearest which contains result dfnearest包含结果
dfstat = pd.DataFrame({'STOP_ID': ['19970', '19971', '19972', '19973', '19974'],
 'STOP_NAME': ['Royal Park Railway Station (Parkville)',
  'Flemington Bridge Railway Station (North Melbo...',
  'Macaulay Railway Station (North Melbourne)',
  'North Melbourne Railway Station (West Melbourne)',
  'Clifton Hill Railway Station (Clifton Hill)'],
 'LATITUDE': ['-37.781193',
  '-37.788140',
  '-37.794267',
  '-37.807419',
  '-37.788657'],
 'LONGITUDE': ['144.952301',
  '144.939323',
  '144.936166',
  '144.942570',
  '144.995417'],
 'TICKETZONE': ['1', '1', '1', '1', '1'],
 'ROUTEUSSP': ['Upfield',
  'Upfield',
  'Upfield',
  'Flemington,Sunbury,Upfield,Werribee,Williamsto...',
  'Mernda,Hurstbridge'],
 'geometry': ['POINT (144.95230 -37.78119)',
  'POINT (144.93932 -37.78814)',
  'POINT (144.93617 -37.79427)',
  'POINT (144.94257 -37.80742)',
  'POINT (144.99542 -37.78866)']})
dfsub = pd.DataFrame({'id': ['4901', '4902', '4903', '4904', '4905'],
 'postcode': ['3000', '3002', '3003', '3005', '3006'],
 'suburb': ['MELBOURNE',
  'EAST MELBOURNE',
  'WEST MELBOURNE',
  'WORLD TRADE CENTRE',
  'SOUTHBANK'],
 'state': ['VIC', 'VIC', 'VIC', 'VIC', 'VIC'],
 'lat': ['-37.814563', '-37.816640', '-37.806255', '-37.822262', '-37.823258'],
 'lon': ['144.970267', '144.987811', '144.941123', '144.954856', '144.965926']})

import geopy.distance
# cartesian product so we get all combinations
dfdist = (dfsub.assign(foo=1).merge(dfstat.assign(foo=1), on="foo")
    # calc distance in km between each suburb and each train station
     .assign(km=lambda dfa: dfa.apply(lambda r: 
                                      geopy.distance.geodesic(
                                          (r["LATITUDE"],r["LONGITUDE"]), 
                                          (r["lat"],r["lon"])).km, axis=1))
    # reduce number of columns to make it more digestable
     .loc[:,["postcode","suburb","STOP_ID","STOP_NAME","km"]]
    # sort so shortest distance station from a suburb is first
     .sort_values(["postcode","suburb","km"])
    # good practice
     .reset_index(drop=True)
)
# finally pick out stations nearest to suburb
# this can easily be joined back to source data frames as postcode and STOP_ID have been maintained
dfnearest = dfdist.groupby(["postcode","suburb"])\
    .agg({"STOP_ID":"first","STOP_NAME":"first","km":"first"}).reset_index()

print(dfnearest.to_string(index=False))
dfnearest

output output

postcode              suburb STOP_ID                                         STOP_NAME        km
    3000           MELBOURNE   19973  North Melbourne Railway Station (West Melbourne)  2.564586
    3002      EAST MELBOURNE   19974       Clifton Hill Railway Station (Clifton Hill)  3.177320
    3003      WEST MELBOURNE   19973  North Melbourne Railway Station (West Melbourne)  0.181463
    3005  WORLD TRADE CENTRE   19973  North Melbourne Railway Station (West Melbourne)  1.970909
    3006           SOUTHBANK   19973  North Melbourne Railway Station (West Melbourne)  2.705553

an approach to reducing size of tested combinations一种减少测试组合大小的方法

# pick nearer places,  based on lon/lat then all combinations
dfdist = (dfsub.assign(foo=1, latr=dfsub["lat"].round(1), lonr=dfsub["lon"].round(1))
          .merge(dfstat.assign(foo=1, latr=dfstat["LATITUDE"].round(1), lonr=dfstat["LONGITUDE"].round(1)), 
                 on=["foo","latr","lonr"])
    # calc distance in km between each suburb and each train station
     .assign(km=lambda dfa: dfa.apply(lambda r: 
                                      geopy.distance.geodesic(
                                          (r["LATITUDE"],r["LONGITUDE"]), 
                                          (r["lat"],r["lon"])).km, axis=1))
    # reduce number of columns to make it more digestable
     .loc[:,["postcode","suburb","STOP_ID","STOP_NAME","km"]]
    # sort so shortest distance station from a suburb is first
     .sort_values(["postcode","suburb","km"])
    # good practice
     .reset_index(drop=True)
)

You can use sklearn.neighbors.NearestNeighbors with a haversine distance.您可以使用具有半正弦距离的sklearn.neighbors.NearestNeighbors

import pandas as pd
dfstat = pd.DataFrame({'STOP_ID': ['19970', '19971', '19972', '19973', '19974'],
                       'STOP_NAME': ['Royal Park Railway Station (Parkville)',  'Flemington Bridge Railway Station (North Melbo...',  'Macaulay Railway Station (North Melbourne)',  'North Melbourne Railway Station (West Melbourne)',  'Clifton Hill Railway Station (Clifton Hill)'],
                       'LATITUDE': ['-37.781193', '-37.788140',  '-37.794267',  '-37.807419',  '-37.788657'],
                       'LONGITUDE': ['144.952301', '144.939323', '144.936166',  '144.942570',  '144.995417'],
                       'TICKETZONE': ['1', '1', '1', '1', '1'], 
                       'ROUTEUSSP': ['Upfield',  'Upfield',  'Upfield',  'Flemington,Sunbury,Upfield,Werribee,Williamsto...',  'Mernda,Hurstbridge'],
                       'geometry': ['POINT (144.95230 -37.78119)',  'POINT (144.93932 -37.78814)',  'POINT (144.93617 -37.79427)',  'POINT (144.94257 -37.80742)',  'POINT (144.99542 -37.78866)']})
dfsub = pd.DataFrame({'id': ['4901', '4902', '4903', '4904', '4905'],
                      'postcode': ['3000', '3002', '3003', '3005', '3006'],
                      'suburb': ['MELBOURNE',  'EAST MELBOURNE',  'WEST MELBOURNE',  'WORLD TRADE CENTRE',  'SOUTHBANK'],
                      'state': ['VIC', 'VIC', 'VIC', 'VIC', 'VIC'],
                      'lat': ['-37.814563', '-37.816640', '-37.806255', '-37.822262', '-37.823258'],
                      'lon': ['144.970267', '144.987811', '144.941123', '144.954856', '144.965926']})

Let's begin by finding the closest point in a dataframe to some random point, say -37.814563, 144.970267 .让我们首先在 dataframe 中找到离某个随机点最近的点,比如-37.814563, 144.970267

NN = NearestNeighbors(n_neighbors=1, metric='haversine')
NN.fit(dfstat[['LATITUDE', 'LONGITUDE']])
NN.kneighbors([[-37.814563, 144.970267]])

The output is (array([[2.55952637]]), array([[3]])) , the distance and the index of the closest point in the dataframe. output 为(array([[2.55952637]]), array([[3]])) ,dataframe中最近点的距离和索引。 The haversine distance in sklearn is in radius . sklearn 中的半正弦距离以radius为单位。 If you want to compute is in km, you can use haversine .如果要计算以 km 为单位,可以使用hasrsine

from haversine import haversine
NN = NearestNeighbors(n_neighbors=1, metric=haversine)
NN.fit(dfstat[['LATITUDE', 'LONGITUDE']])
NN.kneighbors([[-37.814563, 144.970267]])

The output (array([[2.55952637]]), array([[3]])) has the distance in km. output (array([[2.55952637]]), array([[3]]))的距离以公里为单位。

Now you can apply to all points in the dataframe and get closest stations with indices.现在您可以应用到 dataframe 中的所有点,并通过索引获取最近的站点。

indices = NN.kneighbors(dfsub[['lat', 'lon']])[1]
indices = [index[0] for index in indices]
distances = NN.kneighbors(dfsub[['lat', 'lon']])[0]
dfsub['closest_station'] = dfstat.iloc[indices]['STOP_NAME'].reset_index(drop=True)
dfsub['closest_station_distances'] = distances
print(dfsub)
id  postcode    suburb  state   lat lon closest_station closest_station_distances
0   4901    3000    MELBOURNE   VIC -37.814563  144.970267  North Melbourne Railway Station (West Melbourne)    2.559526
1   4902    3002    EAST MELBOURNE  VIC -37.816640  144.987811  Clifton Hill Railway Station (Clifton Hill) 3.182521
2   4903    3003    WEST MELBOURNE  VIC -37.806255  144.941123  North Melbourne Railway Station (West Melbourne)    0.181419
3   4904    3005    WORLD TRADE CENTRE  VIC -37.822262  144.954856  North Melbourne Railway Station (West Melbourne)    1.972010
4   4905    3006    SOUTHBANK   VIC -37.823258  144.965926  North Melbourne Railway Station (West Melbourne)    2.703926

Try this尝试这个

import pandas as pd
def ClosestStop(r):
    # Cartesin Distance: square root of (x2-x2)^2 + (y2-y1)^2
    distances = ((r['lat']-StationDf['LATITUDE'])**2 + (r['lon']-StationDf['LONGITUDE'])**2)**0.5
    
    # Stop with minimum Distance from the Suburb
    closestStationId = distances[distances == distances.min()].index.to_list()[0]
    return StationDf.loc[closestStationId, ['STOP_ID', 'STOP_NAME']]

StationDf = pd.read_excel("StationData.xlsx")
SuburbDf = pd.read_excel("SuburbData.xlsx")

SuburbDf[['ClosestStopId', 'ClosestStopName']] = SuburbDf.apply(ClosestStop, axis=1)
print(SuburbDf)

I would like to post an article that I found and tried myself and it worked while I was at the university.我想发表一篇我自己发现并尝试过的文章,它在我上大学时很有效。 You can use Google Distance Matrix Api .您可以使用谷歌距离矩阵 Api Rather than showing particular code, I would like to refer you to the article itself:我不想显示特定的代码,而是建议您参考文章本身:

https://medium.com/how-to-use-google-distance-matrix-api-in-python/how-to-use-google-distance-matrix-api-in-python-ef9cd895303c https://medium.com/how-to-use-google-distance-matrix-api-in-python/how-to-use-google-distance-matrix-api-in-python-ef9cd895303c

For a given data set organized in rows of latitude and longitude coordinates, you can calculate the distance between consecutive rows.对于按纬度和经度坐标行组织的给定数据集,您可以计算连续行之间的距离。 That will give you the actual distance between two different points.这将为您提供两个不同点之间的实际距离。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM