I've already looked through several answers but have not been able to apply it to my problems. See:
Calculating the distance between points in different data frames
Calculating number of points within a certain radius
find locations within certain lat/lon distance in r
find number of points within a radius in R using lon and lat coordinates
Identify points within specified distance in R
I have df loc
and stop
. For each stop
I want to find the distance to loc
.
My locations
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V')
)
My stops
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
As a final result I would like something like this:
df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker Street','Bank'),
stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'),
dist = c('x','x','x','x','x','x','x','x'),
lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
)
My dataset is relatively big so I'm looking for an efficient method to solve this problem.
Any ideas on how to achieve this?
This makes use of expand.grid
and merge
some creative variable renaming. It's a little man-handly but it's pretty efficient since the operations are vectorized.
library(dplyr)
df <- expand.grid(station = loc$station, stop = stop$station) %>%
merge(loc, by = 'station') %>%
rename(loc = station, lat1 = lat, lng1 = lng, station = stop) %>%
select(-postcode) %>%
merge(stop, by = 'station') %>%
rename(stop = station, lat2 = lat, lng2 = lng)
# stop loc lat1 lng1 lat2 lng2 postcode
# 1 Angel Baker Street 51.52224 -0.15708000 51.53253 -0.105790 EC1V
# 2 Angel Bank 51.51340 -0.08905843 51.53253 -0.105790 EC1V
# 3 Barbican Baker Street 51.52224 -0.15708000 51.52087 -0.097758 EC1A
# 4 Barbican Bank 51.51340 -0.08905843 51.52087 -0.097758 EC1A
# 5 Barons Court Baker Street 51.52224 -0.15708000 51.49028 -0.214340 W14
# 6 Barons Court Bank 51.51340 -0.08905843 51.49028 -0.214340 W14
# 7 Bayswater Baker Street 51.52224 -0.15708000 51.51224 -0.187569 W2
# 8 Bayswater Bank 51.51340 -0.08905843 51.51224 -0.187569 W2
We can then use geosphere::distHaversine()
(inspired by Jacob) to calculate the distances using the Haversine formula .
df$dist_meters <- geosphere::distHaversine(select(df, lng1, lat1),
select(df, lng2, lat2))
df %>%
select(stop, loc, dist_meters)
# stop loc dist_meters
# 1 Angel Baker Street 3732.422
# 2 Angel Bank 2423.989
# 3 Barbican Baker Street 4111.786
# 4 Barbican Bank 1026.091
# 5 Barons Court Baker Street 5328.649
# 6 Barons Court Bank 9054.998
# 7 Bayswater Baker Street 2387.231
# 8 Bayswater Bank 6825.897
And in case your curious how the Haversine formula works,
latrad1 <- df$lat1 * pi/180
latrad2 <- df$lat2 * pi/180
dlat <- df$dlat * pi/180
dlng <- df$dlng * pi/180
a <- sin(dlat / 2)^2 + sin(dlng / 2)^2 * cos(latrad1) * cos(latrad2)
dist_rad <- 2 * atan2(sqrt(a), sqrt(1-a))
df %>%
mutate(dist_meters_byhand = dist_rad * 6378137) %>%
select(stop, loc, dist_meters_geosphere = dist_meters, dist_meters_byhand)
# stop loc dist_meters_geosphere dist_meters_byhand
# 1 Angel Baker Street 3732.422 3732.422
# 2 Angel Bank 2423.989 2423.989
# 3 Barbican Baker Street 4111.786 4111.786
# 4 Barbican Bank 1026.091 1026.091
# 5 Barons Court Baker Street 5328.649 5328.649
# 6 Barons Court Bank 9054.998 9054.998
# 7 Bayswater Baker Street 2387.231 2387.231
# 8 Bayswater Bank 6825.897 6825.897
Not as clever (or probably as fast) as @Ben's but here's another way:
library(geosphere)
master_df <- data.frame()
for (i in 1:nrow(loc)){
this_loc <- loc[i, 1]
temp_df <- cbind(stop,
data.frame(loc = this_loc,
dist = distm(as.matrix(stop[, 2:3]), c(loc[i, 2], loc[i, 3]))))
master_df <- rbind(master_df, temp_df)
}
The geosphere package uses haversine by default which might be useful if accuracy is required.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.