简体   繁体   中英

Return minimum distance between each row and each column of two long lat coordinates in two dataframes

I want to calculate the smallest geographical distance between each row and the column of two dataframes. DF1 has a number of institutions, and DF2 has a number of events. Like, so:

#DF1 (institutions)
 DF1 <- data.frame(latitude=c(41.49532, 36.26906, 40.06599), 
 longitude=c(-98.77298, -101.40585, -80.72291))
 DF1$institution <- letters[seq( from = 1, to = nrow(DF1))] 

#DF2 (events)
 DF2 <- data.frame(latitude=c(32.05, 32.62, 30.23), longitude=c(-86.82,   
 -87.67, -88.02))
 DF2$ID <- seq_len(nrow(DF1)

I want to return the event with the smallest distance to each institution in DF1 and add both the distance and ID from DF2 to DF1. While I know how to calculate the pairwise distance I am incapable of calculating all the distances from DF[1,] to DF2 and return the smallest value and so forth.

This is what I tried (and failed).

  library(geosphere)

  #Define a function
   distanceCALC <- function(x, y) { distm(x = x, y = y, 
    fun = distHaversine)}

  #Define vector of events 
   DF2_vec <- DF2[, c('longitude', 'latitude')]

  #Define df to hold distances
   shrtdist <- data.frame()

Now, my attempt was to feed distanceCALC with row1 of DF1 and the vectorized events.

  #Loop through every row in DF1 and calculate all the distances to instutions a, b, c. Append to DF1 smallest distance + DF2$ID.

  #This only gives me the pairwise distance
   for (i in nrow(DF1)){
    result  <- distanceCALC(DF1[i,c('longitude', 'latitude')], DF2_vec)
     }
  #Somehow take shortest distance for each row*column distance matrix
   shrtdist <- rbind(shrtdist, min(result[,], na.rm = T))

My guess is that the solution entails reshaping of the data and lapply. Also, the loop is very bad practice and much too slow given the number of observations.

Any help is greatly appreciated.

Here's a simple way to approach this using the outer function

squared_distance <- function(x, y ) (x - y)^2

lat <- outer(DF1$latitude, DF2$latitude, squared_distance)
long <- outer(DF1$longitude, DF2$longitude, squared_distance)

pairwise_dist <- sqrt(lat + long)

rownames(pairwise_dist) <- DF1$institution
colnames(pairwise_dist) <- DF2$ID

pairwise_dist

This gives you a matrix of the distances between each institution (rows) and event (column). To get the distance and event in df1, we can do

df1$min_dist <- apply(pairwise_dist, 1, min)
df1$min_inst <- apply(pairwise_dist, 1, min)

Note that the reason the second one works in this case is because the events are labeled by number. If your real data doesn't have that handy feature, we need to do

df1$min_inst <- colnames(pairwise_dist)[apply(pairwise_dist, 1, which.min)]

Update using alternative distance function

I haven't tested this, but I think this should work. Again, the output will be a matrix.

gcd.hf <- function(DF1, DF2) {
  sin2.long <- sin(outer(DF1$longitude, DF2$longitude, "-") / 2)^2
  sin2.lat  <- outer(DF1$latitude, DF2$latitude, "-")
  cos.lat <- outer(cos(DF1$latitude), cos(DF2$latitude), "*")

  a <- sin2.long + sin2.lat * cos.lat # we do this cell-wise
  cir <- 2 * asin(pmin(1, sqrt(a))) # I never assign anything to "c" since that's concatenate.  Rename this variable as appropriate (I have no idea if it's related to the circumference or not.)
  cir * 6371
}

pairwise_dist <- gcd.hf(DF1, DF2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM