繁体   English   中英

如何从R中的单独df计算位置之间的距离

[英]How to calculate distance between locations from seperate df's in R

我已经查看了几个答案,但无法将其应用于我的问题。 看到:

计算不同数据框中点之间的距离

计算一定半径内的点数

在r中找到特定纬度/经度范围内的位置

使用lon和lat坐标找出R中半径内的点数

识别R中指定距离内的点

我有df locstop 对于每个stop我都想找到loc的距离。

我的位置

loc <- data.frame(station = c('Baker Street','Bank'),
                  lat = c(51.522236,51.5134047),
                  lng = c(-0.157080, -0.08905843),
                  postcode = c('NW1','EC3V')
                  )

我的站

stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
                   lat = c(51.53253,51.520865,51.490281,51.51224),
                   lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
                   postcode = c('EC1V','EC1A', 'W14', 'W2'))

最后,我想要这样的东西:

df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker Street','Bank'), 
                 stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'), 
                 dist = c('x','x','x','x','x','x','x','x'), 
                 lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224), 
                 lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
                 postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
                 )

我的数据集相对较大,因此我正在寻找一种有效的方法来解决此问题。

关于如何实现这一目标的任何想法?

这将利用expand.gridmerge一些创意变量重命名。 这有点手工,但是由于操作是矢量化的,所以效率很高。

library(dplyr)
df <- expand.grid(station = loc$station, stop = stop$station) %>%
  merge(loc, by = 'station') %>%
  rename(loc = station, lat1 = lat, lng1 = lng, station = stop) %>%
  select(-postcode) %>%
  merge(stop, by = 'station') %>%
  rename(stop = station, lat2 = lat, lng2 = lng)
#           stop          loc     lat1        lng1     lat2      lng2 postcode
# 1        Angel Baker Street 51.52224 -0.15708000 51.53253 -0.105790     EC1V
# 2        Angel         Bank 51.51340 -0.08905843 51.53253 -0.105790     EC1V
# 3     Barbican Baker Street 51.52224 -0.15708000 51.52087 -0.097758     EC1A
# 4     Barbican         Bank 51.51340 -0.08905843 51.52087 -0.097758     EC1A
# 5 Barons Court Baker Street 51.52224 -0.15708000 51.49028 -0.214340      W14
# 6 Barons Court         Bank 51.51340 -0.08905843 51.49028 -0.214340      W14
# 7    Bayswater Baker Street 51.52224 -0.15708000 51.51224 -0.187569       W2
# 8    Bayswater         Bank 51.51340 -0.08905843 51.51224 -0.187569       W2

然后,我们可以使用Haversine公式使用geosphere::distHaversine() (受Jacob启发)来计算距离。

df$dist_meters <- geosphere::distHaversine(select(df, lng1, lat1),
                                           select(df, lng2, lat2))
df %>%
  select(stop, loc, dist_meters)
#           stop          loc dist_meters
# 1        Angel Baker Street    3732.422
# 2        Angel         Bank    2423.989
# 3     Barbican Baker Street    4111.786
# 4     Barbican         Bank    1026.091
# 5 Barons Court Baker Street    5328.649
# 6 Barons Court         Bank    9054.998
# 7    Bayswater Baker Street    2387.231
# 8    Bayswater         Bank    6825.897

如果您对Haversine公式的工作方式感到好奇,

latrad1 <- df$lat1 * pi/180
latrad2 <- df$lat2 * pi/180
dlat <- df$dlat * pi/180
dlng <- df$dlng * pi/180
a <- sin(dlat / 2)^2 + sin(dlng / 2)^2 * cos(latrad1) * cos(latrad2)
dist_rad <- 2 * atan2(sqrt(a), sqrt(1-a))
df %>%
  mutate(dist_meters_byhand = dist_rad * 6378137) %>%
  select(stop, loc, dist_meters_geosphere = dist_meters, dist_meters_byhand)
#           stop          loc dist_meters_geosphere dist_meters_byhand
# 1        Angel Baker Street              3732.422           3732.422
# 2        Angel         Bank              2423.989           2423.989
# 3     Barbican Baker Street              4111.786           4111.786
# 4     Barbican         Bank              1026.091           1026.091
# 5 Barons Court Baker Street              5328.649           5328.649
# 6 Barons Court         Bank              9054.998           9054.998
# 7    Bayswater Baker Street              2387.231           2387.231
# 8    Bayswater         Bank              6825.897           6825.897

不像@Ben那样聪明(或可能快),但这是另一种方式:

library(geosphere)

master_df <- data.frame()

for (i in 1:nrow(loc)){
  this_loc <- loc[i, 1]
  temp_df <- cbind(stop, 
                   data.frame(loc = this_loc, 
                   dist = distm(as.matrix(stop[, 2:3]), c(loc[i, 2], loc[i, 3]))))
  master_df <- rbind(master_df, temp_df)
}

默认情况下,geosphere软件包使用hasrsine,如果需要准确性,则可能有用。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM