计算两个数据集中的UTM点之间的最短距离R

Question

I'm trying to find the shortest distance between schools and a coastline. 我试图找到学校和海岸线之间的最短距离。 The schools are all in easting and northing format, the coastline is made up of points, also in easting and northing format. 学校全部采用东，北形式，海岸线由点组成，也采用东，北形式。

I've solved this by having a loop that goes through each school and another loop inside the school loop that compares the school location with all the coastline points. 我通过在每个学校中都有一个循环以及在学校循环中比较学校位置和所有海岸线点的另一个循环来解决了这一问题。 This is incredibly slow as I have 40000 schools and 180000 map points and I know you should never use loops in R! 这非常慢，因为我有40000所学校和180000个地图点，而且我知道您不应该在R中使用循环！ I've tried to put the following together: 我尝试将以下内容放在一起：

Test data: 测试数据：

schools <- structure(list(URN = c(100000L, 100008L, 100009L, 100010L,  100011L, 100012L), Easting = c(533498L, 530238L, 524888L, 529912L, 528706L,  528386L), Northing = c(181201L, 182761L, 185067L, 184835L, 186594L,  185209L)), .Names = c("URN", "Easting", "Northing"), row.names = c(NA,  6L), class = "data.frame")

coastline <- structure(list(Easting = c(219588.203816721, 219623.335092579,  219625.861360502, 219661.118975722, 219664.898582579, 219700.155464073 ), Northing = c(607325.869617586, 607324.434359255, 607386.276450707,  607384.83630279, 607477.377010103, 607475.937159766)), .Names = c("Easting", "Northing"), row.names = c(NA, 6L), class = "data.frame")

The code 编码

for (sch in schools$URN){

  minimumDistance <- 500000

  SEasting <- schools %>% filter(URN == sch) %$% Easting
  SNorthing <- schools %>% filter(URN == sch) %$% Northing

  mindisance <- coastline %>% mutate(distance = 
             min(sqrt((SEasting - Easting)^2 +
                (SNorthing - Northing)^2))) %$% distance

  print(paste(sch, "minDistance = ", mindisance))
}

But I get a result for each coastline point: 但是我得到每个海岸线点的结果：

[1] "100000 minDistance =  529243.315102678" "100000 minDistance =  529243.315102678"
[3] "100000 minDistance =  529243.315102678" "100000 minDistance =  529243.315102678"
[5] "100000 minDistance =  529243.315102678" "100000 minDistance =  529243.315102678"

What I'd like is 我想要的是

100000 minDistance = 529243.315102678

Any idea on what I'm doing wrong? 关于我在做什么错的任何想法吗？

Answer 1

Switch mutate to summarise : 切换mutate以summarise ：

for (sch in schools$URN){

  minimumDistance <- 500000

  SEasting <- schools %>% filter(URN == sch) %$% Easting
  SNorthing <- schools %>% filter(URN == sch) %$% Northing

  mindisance <- coastline %>% summarise(distance = 
                                       min(sqrt((SEasting - Easting)^2 +
                                                (SNorthing - Northing)^2)))
%$% distance

  print(paste(sch, "minDistance = ", mindisance))
}

[1] "100000 minDistance =  529243.315102678"
[1] "100008 minDistance =  526056.631790224"
[1] "100009 minDistance =  521044.965922041"
[1] "100010 minDistance =  524191.165239584"
[1] "100011 minDistance =  522059.567618869"
[1] "100012 minDistance =  522987.402491719"

summarise is used to return a singular value such as mean , sum or in this case min . summarise用于返回奇异值，例如mean ， sum或在这种情况下为min 。 mutate is used to change each individual value in a column and then return the whole column. mutate用于更改列中的每个单独的值，然后返回整个列。 I think that explains why the original code was repeating itself on the print command. 我认为这可以解释为什么原始代码在print命令上重复出现。

To avoid for loop altogether you could: 为了完全避免for循环，您可以：

distances<-sapply(1:nrow(schools), function(x)
    with(schools[x,], min(sqrt((coastline$Easting-Easting)^2+  
                          (coastline$Northing-Northing)^2))))

paste(schools$URN, "minDistance = ", distances)

I suspect this is fast. 我怀疑这很快。 Let's test it on a larger data set: 让我们在更大的数据集上进行测试：

set.seed(400)
URN<-10000:19999
Easting1<-sample.int(533498, 10000)
Northing1<-sample.int(180000, 10000)
schools<-data.frame(URN, Easting = Easting1, Northing = Northing1)

Easting2<-sample.int(533498, 10000)
Northing2<-sample.int(180000, 10000)
coastline<-data.frame(Easting = Easting2, Northing = Northing2)

f1<- function() 
  for (sch in schools$URN){

    minimumDistance <- 500000

    SEasting <- schools %>% filter(URN == sch) %$% Easting
    SNorthing <- schools %>% filter(URN == sch) %$% Northing

    mindisance <- coastline %>% summarise(distance = 
                                            min(sqrt((SEasting - Easting)^2+
                                                       (SNorthing-   
                                             Northing)^2))) %$% distance
print(paste(sch, "minDistance = ", mindisance))
  }

f2<- function(){ 
  distances<-sapply(1:nrow(schools), function(x)
 with(schools[x,], min(sqrt((coastline$Easting-Easting)^2+ 
                       (coastline$Northing-Northing)^2))))

 paste(schools$URN, "minDistance = ", distances)
}

library(microbenchmark)
microbenchmark(f1(), f2(), times = 10)
##this takes a while to run

Unit: seconds
expr       min        lq     mean    median        uq       max neval
f1() 20.013022 20.387663 20.53804 20.625776 20.735973 20.763166    10
f2()  2.932491  2.971101  2.99707  3.004892  3.031679  3.044733    10

sapply() method is ~6.8 times faster. sapply()方法快了6.8倍。

计算两个数据集中的UTM点之间的最短距离R

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-06-12 20:50:12

计算两个数据集中的UTM点之间的最短距离R

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-06-12 20:50:12

解决方案1
1 已采纳 2016-06-12 20:50:12