简体   繁体   中英

R: Faster way of computing large distance matrix

I am computing distance matrix between large number of locations (5000) on sphere (using Haversine distance function).

Here is my code:

require(geosphere)
x=rnorm(5000)
y=rnorm(5000)
xy1=cbind(x,y)

The time taken for computing the distance matrix is

 system.time( outer(1:nrow(xy1), 1:nrow(xy1), function(i,j) distHaversine(xy1[i,1:2],xy1[j,1:2])))

The time taken to execute this program is high. Any suggestion how to lower time consumption to do this job! Thanks.

Try the built-in function in the geosphere package?

z <- distm( xy1 )

The default distance function for distm() - which calculates a distance matrix between a set of points - is the Haversine ( "distHaversine" ) formula, but you may specify another using the fun argument.

On my 2.6GHz Core i7 rMBP this takes about 5 seconds for 5,000 points.

I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.

library(geosphere)
library(spatialrisk)
library(data.table)

x=rnorm(5000)
y=rnorm(5000)
xy1 = data.table(x,y)

# Cross join two data tables
coordinates_dt <- optiRum::CJ.dt(xy1, xy1)

system.time({
  z <- distm( xy1 )
})
# user  system elapsed 
# 14.163   3.700  19.072 

system.time({
  distances_m <- coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)]
})
# user  system elapsed 
# 2.027   0.848   2.913 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM