[英]working out shortest distance between UTM points in two datasets R
I'm trying to find the shortest distance between schools and a coastline. 我试图找到学校和海岸线之间的最短距离。 The schools are all in easting and northing format, the coastline is made up of points, also in easting and northing format.
学校全部采用东,北形式,海岸线由点组成,也采用东,北形式。
I've solved this by having a loop that goes through each school and another loop inside the school loop that compares the school location with all the coastline points. 我通过在每个学校中都有一个循环以及在学校循环中比较学校位置和所有海岸线点的另一个循环来解决了这一问题。 This is incredibly slow as I have 40000 schools and 180000 map points and I know you should never use loops in R!
这非常慢,因为我有40000所学校和180000个地图点,而且我知道您不应该在R中使用循环! I've tried to put the following together:
我尝试将以下内容放在一起:
Test data: 测试数据:
schools <- structure(list(URN = c(100000L, 100008L, 100009L, 100010L, 100011L, 100012L), Easting = c(533498L, 530238L, 524888L, 529912L, 528706L, 528386L), Northing = c(181201L, 182761L, 185067L, 184835L, 186594L, 185209L)), .Names = c("URN", "Easting", "Northing"), row.names = c(NA, 6L), class = "data.frame")
coastline <- structure(list(Easting = c(219588.203816721, 219623.335092579, 219625.861360502, 219661.118975722, 219664.898582579, 219700.155464073 ), Northing = c(607325.869617586, 607324.434359255, 607386.276450707, 607384.83630279, 607477.377010103, 607475.937159766)), .Names = c("Easting", "Northing"), row.names = c(NA, 6L), class = "data.frame")
The code 编码
for (sch in schools$URN){
minimumDistance <- 500000
SEasting <- schools %>% filter(URN == sch) %$% Easting
SNorthing <- schools %>% filter(URN == sch) %$% Northing
mindisance <- coastline %>% mutate(distance =
min(sqrt((SEasting - Easting)^2 +
(SNorthing - Northing)^2))) %$% distance
print(paste(sch, "minDistance = ", mindisance))
}
But I get a result for each coastline point: 但是我得到每个海岸线点的结果:
[1] "100000 minDistance = 529243.315102678" "100000 minDistance = 529243.315102678"
[3] "100000 minDistance = 529243.315102678" "100000 minDistance = 529243.315102678"
[5] "100000 minDistance = 529243.315102678" "100000 minDistance = 529243.315102678"
What I'd like is 我想要的是
100000 minDistance = 529243.315102678
Any idea on what I'm doing wrong? 关于我在做什么错的任何想法吗?
Switch mutate
to summarise
: 切换
mutate
以summarise
:
for (sch in schools$URN){
minimumDistance <- 500000
SEasting <- schools %>% filter(URN == sch) %$% Easting
SNorthing <- schools %>% filter(URN == sch) %$% Northing
mindisance <- coastline %>% summarise(distance =
min(sqrt((SEasting - Easting)^2 +
(SNorthing - Northing)^2)))
%$% distance
print(paste(sch, "minDistance = ", mindisance))
}
[1] "100000 minDistance = 529243.315102678"
[1] "100008 minDistance = 526056.631790224"
[1] "100009 minDistance = 521044.965922041"
[1] "100010 minDistance = 524191.165239584"
[1] "100011 minDistance = 522059.567618869"
[1] "100012 minDistance = 522987.402491719"
summarise
is used to return a singular value such as mean
, sum
or in this case min
. summarise
用于返回奇异值,例如mean
, sum
或在这种情况下为min
。 mutate
is used to change each individual value in a column and then return the whole column. mutate
用于更改列中的每个单独的值,然后返回整个列。 I think that explains why the original code was repeating itself on the print
command. 我认为这可以解释为什么原始代码在
print
命令上重复出现。
To avoid for
loop altogether you could: 为了完全避免
for
循环,您可以:
distances<-sapply(1:nrow(schools), function(x)
with(schools[x,], min(sqrt((coastline$Easting-Easting)^2+
(coastline$Northing-Northing)^2))))
paste(schools$URN, "minDistance = ", distances)
I suspect this is fast. 我怀疑这很快。 Let's test it on a larger data set:
让我们在更大的数据集上进行测试:
set.seed(400)
URN<-10000:19999
Easting1<-sample.int(533498, 10000)
Northing1<-sample.int(180000, 10000)
schools<-data.frame(URN, Easting = Easting1, Northing = Northing1)
Easting2<-sample.int(533498, 10000)
Northing2<-sample.int(180000, 10000)
coastline<-data.frame(Easting = Easting2, Northing = Northing2)
f1<- function()
for (sch in schools$URN){
minimumDistance <- 500000
SEasting <- schools %>% filter(URN == sch) %$% Easting
SNorthing <- schools %>% filter(URN == sch) %$% Northing
mindisance <- coastline %>% summarise(distance =
min(sqrt((SEasting - Easting)^2+
(SNorthing-
Northing)^2))) %$% distance
print(paste(sch, "minDistance = ", mindisance))
}
f2<- function(){
distances<-sapply(1:nrow(schools), function(x)
with(schools[x,], min(sqrt((coastline$Easting-Easting)^2+
(coastline$Northing-Northing)^2))))
paste(schools$URN, "minDistance = ", distances)
}
library(microbenchmark)
microbenchmark(f1(), f2(), times = 10)
##this takes a while to run
Unit: seconds
expr min lq mean median uq max neval
f1() 20.013022 20.387663 20.53804 20.625776 20.735973 20.763166 10
f2() 2.932491 2.971101 2.99707 3.004892 3.031679 3.044733 10
sapply()
method is ~6.8 times faster. sapply()
方法快了6.8倍。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.