I am computing distance matrix between large number of locations (5000) on sphere (using Haversine distance function).
Here is my code:
require(geosphere)
x=rnorm(5000)
y=rnorm(5000)
xy1=cbind(x,y)
The time taken for computing the distance matrix is
system.time( outer(1:nrow(xy1), 1:nrow(xy1), function(i,j) distHaversine(xy1[i,1:2],xy1[j,1:2])))
The time taken to execute this program is high. Any suggestion how to lower time consumption to do this job! Thanks.
Try the built-in function in the geosphere
package?
z <- distm( xy1 )
The default distance function for distm()
- which calculates a distance matrix between a set of points - is the Haversine ( "distHaversine"
) formula, but you may specify another using the fun
argument.
On my 2.6GHz Core i7 rMBP this takes about 5 seconds for 5,000 points.
I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.
library(geosphere)
library(spatialrisk)
library(data.table)
x=rnorm(5000)
y=rnorm(5000)
xy1 = data.table(x,y)
# Cross join two data tables
coordinates_dt <- optiRum::CJ.dt(xy1, xy1)
system.time({
z <- distm( xy1 )
})
# user system elapsed
# 14.163 3.700 19.072
system.time({
distances_m <- coordinates_dt[, dist_m := spatialrisk::haversine(y, x, i.y, i.x)]
})
# user system elapsed
# 2.027 0.848 2.913
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.