简体   繁体   English

计算R中累积距离的矩阵

[英]Calculate matrix of cumulative distances in R

I need an efficient way to calculate a matrix of distances between a series of points. 我需要一种有效的方法来计算一系列点之间的距离矩阵。 The catch is that you can only get from point 'i' to point 'k' by passing through all points 'j' in between. 要注意的是,您只能通过点之间的所有点“ j”才能从点“ i”到达点“ k”。 As an example, imagine an island with 5 beaches and you want to calculate the distance between all beaches along the shoreline because you cannot cut across the island (including in both directions: clockwise or counter-clockwise). 例如,假设有一个拥有5个海滩的岛屿,并且您想计算沿海岸线的所有海滩之间的距离,因为您不能跨过该岛屿(包括双向(顺时针或逆时针))。

Below are some example data. 以下是一些示例数据。 (Note: You will need to install the package 'geosphere' to use the 'distm' function, which calculates the distance between GPS coordinates along the surface of the Earth) (注意:您需要安装软件包“ geosphere”才能使用“ distm”功能,该功能可以计算沿地球表面的GPS坐标之间的距离)

library("geosphere")

longitude = c(-119.003, -119.067, -119.121, -119.089, -119.003)
latitude = c(33.503, 33.539, 33.485, 33.413, 33.440)
long.lat.mat = as.matrix(cbind(longitude, latitude))

# Use "distm" to calculate Euclidean (straight-line) distances between sites (in km)
euclid.dist.mat = distm(long.lat.mat) / 1000

# Create an empty matrix of alongshore distances (from "rows" to "columns")
alongshore.dist.mat = matrix(ncol=dim(long.lat.mat)[1], nrow=dim(long.lat.mat)[1], data=NA)   

# Diagonal is zero. Adjacent sites are the same as Euclidean distance
diag(alongshore.dist.mat) = 0   
diag(alongshore.dist.mat[,-1]) = diag(euclid.dist.mat[,-1])
alongshore.dist.mat[1,dim(long.lat.mat)[1]] = euclid.dist.mat[1,dim(long.lat.mat)[1]]
alongshore.dist.mat[lower.tri(alongshore.dist.mat)] = t(alongshore.dist.mat)[lower.tri(t(alongshore.dist.mat))]

# > alongshore.dist.mat
#           [,1]      [,2]      [,3]      [,4]      [,5]
# [1,] 0.0000000 7.1650632        NA        NA 7.0131279
# [2,] 7.1650632 0.0000000 7.8265783        NA        NA
# [3,]        NA 7.8265783 0.0000000 8.5483605        NA
# [4,]        NA        NA 8.5483605 0.0000000 8.5365807
# [5,] 7.0131279        NA        NA 8.5365807 0.0000000

Now, how to fill in remaining cells? 现在,如何填写剩余的单元格? As an example: 举个例子:

alongshore.dist.mat[1,3] = 7.1650632 + 7.8265783 = 14.991642 

...representing site 1 -> site 2 -> site 3. By contrast: ...代表站点1->站点2->站点3。相比之下:

alongshore.dist.mat[3,1] = 8.5483605 + 8.5365807 + 7.0131279 = 24.098069

...representing site 3 -> site 4 -> site 5 -> site 1. ...代表站点3->站点4->站点5->站点1。

I suspect that the "cumsum" function can be used efficiently, but not sure exactly how to set it up. 我怀疑可以有效使用“ cumsum”功能,但不确定如何设置它。 I am hoping for a solution avoiding for-loops, as I in reality am working with data containing dozens of points. 我希望有一个避免for循环的解决方案,因为我实际上正在处理包含数十个点的数据。

You could first build a matrix of all edges between two locations: 您可以首先建立两个位置之间所有边的矩阵:

dists <- expand.grid(x=1:5, y=1:5)
dists$weight <- alongshore.dist.mat[as.matrix(dists)]
dists <- subset(dists, x != y & !is.na(weight))
dists
#    x y   weight
# 2  2 1 7.165063
# 5  5 1 7.013128
# 6  1 2 7.165063
# 8  3 2 7.826578
# 12 2 3 7.826578
# 14 4 3 8.548360
# 18 3 4 8.548360
# 20 5 4 8.536581
# 21 1 5 7.013128
# 24 4 5 8.536581

Now you can build a graph and compute the all-pairs shortest paths: 现在,您可以构建图并计算所有对的最短路径:

library(igraph)
g <- graph.data.frame(dists, vertices=data.frame(x=1:5))
shortest.paths(g)
#           1         2         3         4         5
# 1  0.000000  7.165063 14.991642 15.549709  7.013128
# 2  7.165063  0.000000  7.826578 16.374939 14.178191
# 3 14.991642  7.826578  0.000000  8.548360 17.084941
# 4 15.549709 16.374939  8.548360  0.000000  8.536581
# 5  7.013128 14.178191 17.084941  8.536581  0.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM