简体   繁体   中英

How to calculate distance between locations from seperate df's in R

I've already looked through several answers but have not been able to apply it to my problems. See:

Calculating the distance between points in different data frames

Calculating number of points within a certain radius

find locations within certain lat/lon distance in r

find number of points within a radius in R using lon and lat coordinates

Identify points within specified distance in R

I have df loc and stop . For each stop I want to find the distance to loc .

My locations

loc <- data.frame(station = c('Baker Street','Bank'),
                  lat = c(51.522236,51.5134047),
                  lng = c(-0.157080, -0.08905843),
                  postcode = c('NW1','EC3V')
                  )

My stops

stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
                   lat = c(51.53253,51.520865,51.490281,51.51224),
                   lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
                   postcode = c('EC1V','EC1A', 'W14', 'W2'))

As a final result I would like something like this:

df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker Street','Bank'), 
                 stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'), 
                 dist = c('x','x','x','x','x','x','x','x'), 
                 lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224), 
                 lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
                 postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
                 )

My dataset is relatively big so I'm looking for an efficient method to solve this problem.

Any ideas on how to achieve this?

This makes use of expand.grid and merge some creative variable renaming. It's a little man-handly but it's pretty efficient since the operations are vectorized.

library(dplyr)
df <- expand.grid(station = loc$station, stop = stop$station) %>%
  merge(loc, by = 'station') %>%
  rename(loc = station, lat1 = lat, lng1 = lng, station = stop) %>%
  select(-postcode) %>%
  merge(stop, by = 'station') %>%
  rename(stop = station, lat2 = lat, lng2 = lng)
#           stop          loc     lat1        lng1     lat2      lng2 postcode
# 1        Angel Baker Street 51.52224 -0.15708000 51.53253 -0.105790     EC1V
# 2        Angel         Bank 51.51340 -0.08905843 51.53253 -0.105790     EC1V
# 3     Barbican Baker Street 51.52224 -0.15708000 51.52087 -0.097758     EC1A
# 4     Barbican         Bank 51.51340 -0.08905843 51.52087 -0.097758     EC1A
# 5 Barons Court Baker Street 51.52224 -0.15708000 51.49028 -0.214340      W14
# 6 Barons Court         Bank 51.51340 -0.08905843 51.49028 -0.214340      W14
# 7    Bayswater Baker Street 51.52224 -0.15708000 51.51224 -0.187569       W2
# 8    Bayswater         Bank 51.51340 -0.08905843 51.51224 -0.187569       W2

We can then use geosphere::distHaversine() (inspired by Jacob) to calculate the distances using the Haversine formula .

df$dist_meters <- geosphere::distHaversine(select(df, lng1, lat1),
                                           select(df, lng2, lat2))
df %>%
  select(stop, loc, dist_meters)
#           stop          loc dist_meters
# 1        Angel Baker Street    3732.422
# 2        Angel         Bank    2423.989
# 3     Barbican Baker Street    4111.786
# 4     Barbican         Bank    1026.091
# 5 Barons Court Baker Street    5328.649
# 6 Barons Court         Bank    9054.998
# 7    Bayswater Baker Street    2387.231
# 8    Bayswater         Bank    6825.897

And in case your curious how the Haversine formula works,

latrad1 <- df$lat1 * pi/180
latrad2 <- df$lat2 * pi/180
dlat <- df$dlat * pi/180
dlng <- df$dlng * pi/180
a <- sin(dlat / 2)^2 + sin(dlng / 2)^2 * cos(latrad1) * cos(latrad2)
dist_rad <- 2 * atan2(sqrt(a), sqrt(1-a))
df %>%
  mutate(dist_meters_byhand = dist_rad * 6378137) %>%
  select(stop, loc, dist_meters_geosphere = dist_meters, dist_meters_byhand)
#           stop          loc dist_meters_geosphere dist_meters_byhand
# 1        Angel Baker Street              3732.422           3732.422
# 2        Angel         Bank              2423.989           2423.989
# 3     Barbican Baker Street              4111.786           4111.786
# 4     Barbican         Bank              1026.091           1026.091
# 5 Barons Court Baker Street              5328.649           5328.649
# 6 Barons Court         Bank              9054.998           9054.998
# 7    Bayswater Baker Street              2387.231           2387.231
# 8    Bayswater         Bank              6825.897           6825.897

Not as clever (or probably as fast) as @Ben's but here's another way:

library(geosphere)

master_df <- data.frame()

for (i in 1:nrow(loc)){
  this_loc <- loc[i, 1]
  temp_df <- cbind(stop, 
                   data.frame(loc = this_loc, 
                   dist = distm(as.matrix(stop[, 2:3]), c(loc[i, 2], loc[i, 3]))))
  master_df <- rbind(master_df, temp_df)
}

The geosphere package uses haversine by default which might be useful if accuracy is required.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM