I have two CSV files containing the coordinates of locations (11 million rows with three columns: "lid", "lat", "lon") and facilities (50k rows columns "fid", "lat", "lon"). For each location, I need to calculate the minimum distance to the nearest facility.
I know how to do this using "st_distance" in R. However, "st_distance" is taking ages because it first calculates the full matrix of distances and the two files are pretty large. I have tried breaking the location files intro smaller groups and use "future_map" across 3 cores, it is taking a lot more time than I expected. Is there a way to speed up the process?
Have you thought about using st_buffer first? This would limit the amount of locations you would need to search to find the closest location. For example start with a radius of 10 miles and see if that captures all of the data. If that doesn't work maybe try the findNeighbors() function. See documentation https://www.rdocumentation.org/packages/fractal/versions/2.0-4/topics/findNeighbors
In the future it would also be good if you provided a sample of your data.
I am sure there must better ways of doing this but this is how I would do it. Hope it is helpful.
library(tidyverse)
library(furrr)
MILLION_1 <- 10^6
K_50 <- 10^4*5
# dummy data --------------------------------------------------------------------
d_1m <-
tibble(
lid_1m = 1:MILLION_1,
long_1m = abs(rnorm(MILLION_1) * 100),
lat_1m = abs(rnorm(MILLION_1)) * 100
)
d_50k <-
tibble(
lid_50k= 1:K_50,
long_50k = abs(rnorm(K_50) * 100),
lat_50k = abs(rnorm(K_50) * 100)
)
# distance calculation for each facility ------------------------------------------
future::plan(multiprocess)
d_distance <-
# take one row of facility: long,lat and id as an input
future_pmap_dfr(d_50k, function(...){
d50_row <- tibble(...)
# to calculate distance between one facility location and 1 million other locations
d <- tidyr::crossing(d_1m, d50_row)
d %>%
mutate(
#euclidean distance
distance = sqrt((long_1m - long_50k)^2 + (lat_1m - lat_50k)^2)
) %>%
# to get the location which is the closest to the facility
filter(distance == min(distance))
})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.