簡體   English   中英

如何從R中的單獨df計算位置之間的距離

[英]How to calculate distance between locations from seperate df's in R

我已經查看了幾個答案,但無法將其應用於我的問題。 看到:

計算不同數據框中點之間的距離

計算一定半徑內的點數

在r中找到特定緯度/經度范圍內的位置

使用lon和lat坐標找出R中半徑內的點數

識別R中指定距離內的點

我有df locstop 對於每個stop我都想找到loc的距離。

我的位置

loc <- data.frame(station = c('Baker Street','Bank'),
                  lat = c(51.522236,51.5134047),
                  lng = c(-0.157080, -0.08905843),
                  postcode = c('NW1','EC3V')
                  )

我的站

stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
                   lat = c(51.53253,51.520865,51.490281,51.51224),
                   lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
                   postcode = c('EC1V','EC1A', 'W14', 'W2'))

最后,我想要這樣的東西:

df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker Street','Bank'), 
                 stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'), 
                 dist = c('x','x','x','x','x','x','x','x'), 
                 lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224), 
                 lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
                 postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
                 )

我的數據集相對較大,因此我正在尋找一種有效的方法來解決此問題。

關於如何實現這一目標的任何想法?

這將利用expand.gridmerge一些創意變量重命名。 這有點手工,但是由於操作是矢量化的,所以效率很高。

library(dplyr)
df <- expand.grid(station = loc$station, stop = stop$station) %>%
  merge(loc, by = 'station') %>%
  rename(loc = station, lat1 = lat, lng1 = lng, station = stop) %>%
  select(-postcode) %>%
  merge(stop, by = 'station') %>%
  rename(stop = station, lat2 = lat, lng2 = lng)
#           stop          loc     lat1        lng1     lat2      lng2 postcode
# 1        Angel Baker Street 51.52224 -0.15708000 51.53253 -0.105790     EC1V
# 2        Angel         Bank 51.51340 -0.08905843 51.53253 -0.105790     EC1V
# 3     Barbican Baker Street 51.52224 -0.15708000 51.52087 -0.097758     EC1A
# 4     Barbican         Bank 51.51340 -0.08905843 51.52087 -0.097758     EC1A
# 5 Barons Court Baker Street 51.52224 -0.15708000 51.49028 -0.214340      W14
# 6 Barons Court         Bank 51.51340 -0.08905843 51.49028 -0.214340      W14
# 7    Bayswater Baker Street 51.52224 -0.15708000 51.51224 -0.187569       W2
# 8    Bayswater         Bank 51.51340 -0.08905843 51.51224 -0.187569       W2

然后,我們可以使用Haversine公式使用geosphere::distHaversine() (受Jacob啟發)來計算距離。

df$dist_meters <- geosphere::distHaversine(select(df, lng1, lat1),
                                           select(df, lng2, lat2))
df %>%
  select(stop, loc, dist_meters)
#           stop          loc dist_meters
# 1        Angel Baker Street    3732.422
# 2        Angel         Bank    2423.989
# 3     Barbican Baker Street    4111.786
# 4     Barbican         Bank    1026.091
# 5 Barons Court Baker Street    5328.649
# 6 Barons Court         Bank    9054.998
# 7    Bayswater Baker Street    2387.231
# 8    Bayswater         Bank    6825.897

如果您對Haversine公式的工作方式感到好奇,

latrad1 <- df$lat1 * pi/180
latrad2 <- df$lat2 * pi/180
dlat <- df$dlat * pi/180
dlng <- df$dlng * pi/180
a <- sin(dlat / 2)^2 + sin(dlng / 2)^2 * cos(latrad1) * cos(latrad2)
dist_rad <- 2 * atan2(sqrt(a), sqrt(1-a))
df %>%
  mutate(dist_meters_byhand = dist_rad * 6378137) %>%
  select(stop, loc, dist_meters_geosphere = dist_meters, dist_meters_byhand)
#           stop          loc dist_meters_geosphere dist_meters_byhand
# 1        Angel Baker Street              3732.422           3732.422
# 2        Angel         Bank              2423.989           2423.989
# 3     Barbican Baker Street              4111.786           4111.786
# 4     Barbican         Bank              1026.091           1026.091
# 5 Barons Court Baker Street              5328.649           5328.649
# 6 Barons Court         Bank              9054.998           9054.998
# 7    Bayswater Baker Street              2387.231           2387.231
# 8    Bayswater         Bank              6825.897           6825.897

不像@Ben那樣聰明(或可能快),但這是另一種方式:

library(geosphere)

master_df <- data.frame()

for (i in 1:nrow(loc)){
  this_loc <- loc[i, 1]
  temp_df <- cbind(stop, 
                   data.frame(loc = this_loc, 
                   dist = distm(as.matrix(stop[, 2:3]), c(loc[i, 2], loc[i, 3]))))
  master_df <- rbind(master_df, temp_df)
}

默認情況下,geosphere軟件包使用hasrsine,如果需要准確性,則可能有用。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM