简体   繁体   中英

How to vectorize a for loop for coordinates calculation in R?

I am trying to see how many points are within every center points but this is currently done in a for loop. Would it be possible if this could be vectorized? Seen below is a snippet that could be reproducible. Thank you.

require(geosphere)

centers <- as.data.frame(matrix(rnorm(10, mean = 40, sd = .5), ncol = 2, byrow = TRUE))
points <- matrix(rnorm(100, mean = 40, sd = 1), ncol = 2, byrow = TRUE)

for(i in 1:dim(centers)[1]){
  # Calculate number of points that are 50 km within every center point
  centers[i,3] <- sum(geosphere::distHaversine(points, 
                                               centers[i,c(1:2)]) /
                        1000 < 50, na.rm = TRUE)
}

I don't think you can really vectorise the function if it can process only one point at a time. You can replace the for loop with sapply and see if there is any performance improvement.

library(geosphere)

centers$total <- sapply(seq(nrow(centers)), function(i) {
      sum(distHaversine(points, centers[i,]) /1000 < 50, na.rm = TRUE)
})  

You can use split with row and sapply followed by a colSums :

library(geosphere)
centers$res <- colSums(
  sapply(split(as.matrix(centers[, 1:2]), row(centers)[, 1:2]), 
         distHaversine, p1 = points) / 1000 < 50, na.rm = TRUE)

It gives the same:

# compute the old result to compare with
for(i in 1:dim(centers)[1])
  centers[i,4] <- sum(geosphere::distHaversine(points, 
                                               centers[i,c(1:2)]) /
                        1000 < 50, na.rm = TRUE)

# gives the same
all.equal(centers$res, centers[, 4])
#R> [1] TRUE

A possible alternative is:

dists <- tapply(as.matrix(centers[, 1:2]), row(centers[, 1:2]), 
                distHaversine, p1 = points)
centers$res <- colSums(simplify2array(dists) / 1000 < 50, na.rm = TRUE)

or to use an anonymous function. This would be like Ronak Shah answer but with tapply :

centers$res <- c(tapply(
  as.matrix(centers[, 1:2]), row(centers[, 1:2]), function(x)
    sum(distHaversine(points, x) / 1000 < 50, na.rm = TRUE)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM