与Geosphere的距离矩阵：避免重复微积分

Question

I want to compute the distance among all points in a very large matrix using distm from geosphere . 我想使用来自地geosphere distm来计算非常大的矩阵中所有点之间的距离。

See a minimal example: 看一个最小的例子：

library(geosphere)
library(data.table)

coords <- data.table(coordX=c(1,2,5,9), coordY=c(2,2,0,1))
distances <- distm(coords, coords, fun = distGeo)

The issue is that due to the nature of the distances I am computing, distm gives me back a symmetric matrix, therefore, I could avoid to calculate more than half of the distances: 问题在于，由于我正在计算的距离的性质， distm给了我一个对称矩阵，因此，我可以避免计算超过一半的距离：

structure(c(0, 111252.129800202, 497091.059564718, 897081.91986428, 
111252.129800202, 0, 400487.621661164, 786770.053508848, 497091.059564718, 
400487.621661164, 0, 458780.072878927, 897081.91986428, 786770.053508848, 
458780.072878927, 0), .Dim = c(4L, 4L))

May you help me to find a more efficient way to compute all those distances avoiding doing twice each one? 你可以帮我找一个更有效的方法来计算所有这些距离，避免每次做两次吗？

Answer 1

You can prepare a data frame of possible combinations without repetitions (with gtools packages). 您可以准备可能组合的数据框而无需重复（使用gtools包）。 Then to compute distances for those pairs. 然后计算这些对的距离。 Here is the code: 这是代码：

library(gtools)
library(geosphere)
library(data.table)

coords <- data.table(coordX = c(1, 2, 5, 9), coordY = c(2, 2, 0, 1))
pairs <- combinations(n = nrow(coords), r = 2, repeats.allowed = F, v = c(1:nrow(coords)))

distances <- apply(pairs, 1, function(x) {
    distm(coords[x[1], ], coords[x[2], ], fun = distGeo)
})

# Construct distances matrix
dist_mat <- matrix(NA, nrow = nrow(coords), ncol = nrow(coords))
dist_mat[upper.tri(dist_mat)] <- distances
dist_mat[lower.tri(dist_mat)] <- distances
dist_mat[is.na(dist_mat)] <- 0

print(dist_mat)

The results: 结果：

         [,1]     [,2]     [,3]     [,4]
[1,]      0.0 111252.1 497091.1 400487.6
[2,] 111252.1      0.0 897081.9 786770.1
[3,] 497091.1 400487.6      0.0 458780.1
[4,] 897081.9 786770.1 458780.1      0.0

Answer 2

If you want to compute all pairwise distances for points x , it is better to use distm(x) rather than distm(x,x) . 如果要计算点x所有成对距离，最好使用distm(x)而不是distm(x,x) 。 The distm function returns the same symmetric matrix in both cases but when you pass it a single argument it knows that the matrix is symmetric, so it won't do unnecessary computations. distm函数在两种情况下都返回相同的对称矩阵，但是当您传递一个参数时，它知道矩阵是对称的，因此它不会进行不必要的计算。

You can time it. 你可以计时。

library("geosphere")

n <- 500
xy <- matrix(runif(n*2, -90, 90), n, 2)

system.time( replicate(100, distm(xy, xy) ) )
#  user  system elapsed 
# 61.44    0.23   62.79 
system.time( replicate(100, distm(xy) ) )
#  user  system elapsed 
# 36.27    0.39   38.05

You can also look at the R code for geosphere::distm to check that it treats the two cases differently. 您还可以查看geosphere::distm的R代码，以检查它geosphere::distm以不同方式处理这两种情况。

Aside: Quick google search finds parallelDist : Parallel Distance Matrix Computation on CRAN. 除此之外：快速谷歌搜索找到parallelDist ：CRAN上的并行距离矩阵计算。 The geodesic distance is an option. 测地距离是一种选择。

Answer 3

Using combn() from base R might be slightly simpler and probably faster than loading additional packages. 使用基础R中的combn()可能稍微简单一些，并且可能比加载其他包更快。 Then, distm() uses distGeo() as a source, so using the latter should be even faster. 然后， distm()使用distGeo()作为源，因此使用后者应该更快。

coords <- as.data.frame(coords)  # this won't work with data.tables though
cbind(t(combn(1:4, 2)), unique(geosphere::distGeo(coords[combn(1:4, 2), ])))
#      [,1] [,2]     [,3]
# [1,]    1    2 111252.1
# [2,]    1    3 497091.1
# [3,]    1    4 897081.9
# [4,]    2    3 786770.1
# [5,]    2    4 400487.6
# [6,]    3    4 458780.1

We could check it out with a benchmark. 我们可以用基准测试一下。

Unit: microseconds
    expr     min      lq     mean  median       uq     max neval cld
   distm 555.690 575.846 597.7672 582.352 596.1295 904.718   100   b
 distGeo 426.335 434.372 450.0196 441.516 451.8490 609.524   100  a

Looks good. 看起来不错。

与Geosphere的距离矩阵：避免重复微积分

问题描述

3 个解决方案

解决方案1
2 2019-03-03 17:50:20

解决方案2
2 已采纳 2019-03-03 17:59:16

解决方案3
2 2019-03-03 18:04:22

与Geosphere的距离矩阵：避免重复微积分

问题描述

3 个解决方案

解决方案1 2 2019-03-03 17:50:20

解决方案2 2 已采纳 2019-03-03 17:59:16

解决方案3 2 2019-03-03 18:04:22

解决方案1
2 2019-03-03 17:50:20

解决方案2
2 已采纳 2019-03-03 17:59:16

解决方案3
2 2019-03-03 18:04:22