简体   繁体   中英

Fastest way to calculate minimum distance from a large set of locations to facilties in R/sf

I have two CSV files containing the coordinates of locations (11 million rows with three columns: "lid", "lat", "lon") and facilities (50k rows columns "fid", "lat", "lon"). For each location, I need to calculate the minimum distance to the nearest facility.

I know how to do this using "st_distance" in R. However, "st_distance" is taking ages because it first calculates the full matrix of distances and the two files are pretty large. I have tried breaking the location files intro smaller groups and use "future_map" across 3 cores, it is taking a lot more time than I expected. Is there a way to speed up the process?

Have you thought about using st_buffer first? This would limit the amount of locations you would need to search to find the closest location. For example start with a radius of 10 miles and see if that captures all of the data. If that doesn't work maybe try the findNeighbors() function. See documentation https://www.rdocumentation.org/packages/fractal/versions/2.0-4/topics/findNeighbors

In the future it would also be good if you provided a sample of your data.

I am sure there must better ways of doing this but this is how I would do it. Hope it is helpful.

library(tidyverse)
library(furrr)


MILLION_1 <- 10^6
K_50 <- 10^4*5

# dummy data --------------------------------------------------------------------

d_1m <- 
  tibble(
    lid_1m = 1:MILLION_1,
    long_1m = abs(rnorm(MILLION_1) * 100),
    lat_1m = abs(rnorm(MILLION_1)) * 100
  )


d_50k <- 
  tibble(
    lid_50k= 1:K_50,
    long_50k = abs(rnorm(K_50) * 100),
    lat_50k = abs(rnorm(K_50) * 100)
  )


# distance calculation for each facility ------------------------------------------

future::plan(multiprocess)

d_distance <- 
  # take one row of facility: long,lat and id as an input
  future_pmap_dfr(d_50k, function(...){
  d50_row <- tibble(...)
  # to calculate distance between one facility location and 1 million other locations 
  d <- tidyr::crossing(d_1m, d50_row)
  
  d %>% 
    mutate(
      #euclidean distance
      distance = sqrt((long_1m - long_50k)^2 + (lat_1m - lat_50k)^2)
      ) %>% 
    # to get the location which is the closest to the facility
    filter(distance == min(distance))

})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM