[英]R calculate distance based on latitude-longitude from two data frames
I am trying to substitute values in a data frame from values in another data frame based on a condition. 我试图根据条件从另一个数据框中的值替换一个数据框中的值。
Both data contain latitude, longitude and height but one of them is shorter. 这两个数据都包含纬度,经度和高度,但是其中一个较短。 I want to pick any point from the shorter data frame (5103 rows) , find the closest values on latitude and longitude (by calculating distance) on the second one (188426 rows) and then replace the height value on the longest data frame with the height on the shorter one. 我想从较短的数据框(5103行)中选择任意点,在第二个数据框(188426行)上找到经度和纬度(通过计算距离)的最接近值,然后将最长数据框上的高度值替换为矮个子的高度。
The first data frame is topo.rams in the below code and the second is topo.msg. 以下代码中的第一个数据帧是topo.rams,第二个数据帧是topo.msg。 The final purpose is to substitute height in topo.msg with height values in topo.rams 最终目的是将topo.msg中的高度替换为topo.rams中的高度值
topo.rams:
longitud,latitud,tempc,u,v,w,relhum,speed,topo
-1.7107, 38.1464, 18.2412, -6.1744, -0.3708, 0.0000, 58.6447, 6.3584,460.5908
-1.7107, 38.1734, 18.5915, -5.7757, -0.3165, 0.0000, 61.8492, 5.9840,416.0403
topo.msg
height,longitud,latitud
448.0, 1.70, 38.14
402.0, 1.70, 38.18
and the desired output (topo.msg modified) 和所需的输出(已修改topo.msg)
height,longitud,latitud
460.5908, 1.70, 38.14
416.0403, 1.70, 38.18
and the code used 和使用的代码
#lectura de datos
topo.msg=read.csv("MSG_DEM.txt",sep=",",header=FALSE)
colnames(topo.msg) <- c("topoMSG","longitud","latitud")
topo.rams=read.csv("topografia-rams.txt",sep=",",header=TRUE)
# número de estaciones a tratar
puntos.rams=dim(topo.rams)[1]
puntos.msg=dim(topo.msg)[1]
# Localización del punto de MSG más próximo a la estación.
# Se calcula la distancia a partir de las coordenadas lat-lon
topo.temp=data.frame()
for(i in 1:puntos.rams)
{
for(j in 1:puntos.msg)
{
dlon<-topo.rams$longitud[i]-topo.msg$longitud
if ( dlon < 0.5 && dlat < 0.5) {
dlat<-topo.rams$latitud[i]-topo.msg$latitud
if ( dlat < 0.5) {
n1<-n1+1
distancia=sqrt(dlon*dlon+dlat*dlat)
}
}
indexj=which.min(distancia)
}
topo.msg$topo[indexj] = topo.rams$topo[i]
}
This code seems to run but it takes a very long time. 该代码似乎可以运行,但是需要很长时间。 I have also tried to create a distance matrix with geosphere package from the post in Geographic distance between 2 lists of lat/lon coordinates But R complaints about allocating a 3.6 Gb. 我还尝试根据地理范围包中的两个纬度/经度坐标列表之间的地理距离创建一个距离矩阵包,但是R抱怨分配3.6 Gb。
How can I adress this issue? 我该如何解决这个问题? I would like to optimize the loop or to use distance matrix. 我想优化循环或使用距离矩阵。 For sure there has to be a cleaner, more efficient way to calculate distances. 当然,必须有一种更清洁,更有效的方法来计算距离。
Thanks in advance 提前致谢
From the comment by Patric I switched from loop to matrix/vector computation. 从Patric的评论中,我从循环切换到矩阵/矢量计算。 Now the code is running, simpler and more efficient. 现在,代码正在运行,更简单,更高效。
for(i in 1:puntos.rams)
{
dlon<-topo.rams$longitud[i]-topo.msg$longitud
dlat<-topo.rams$latitud[i]-topo.msg$latitud
distancia<-matrix(sqrt(dlon*dlon+dlat*dlat))
indexj=which.min(distancia)
topo.temp$topo[indexj] = topo.rams$topo[i]
}
There's probably a more elegant way to do this calculation. 进行计算可能是更优雅的方法。 I would appreciate any input. 我将不胜感激。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.