简体   繁体   English

在data.table上使用geosphere distm函数来计算距离

[英]Using the geosphere distm function on a data.table to calculate distances

I've created a data.table in that has 6 columns. 我创建了一个data.table,它有6列。 My data.table has a columns compairing two locations: Location 1 and Location 2. I'm trying to use the distm function to calculate the distance between the locations on each row, creating a 7th column. 我的data.table有一个列为两个位置的列:位置1和位置2.我正在尝试使用distm函数来计算每行上位置之间的距离,从而创建第7列。 The distm package in the geosphere package requires two different vectors for each lat/long combo to be calculated against. geosphere包中的distm包需要针对每个纬度/长度组合使用两个不同的向量。 My code below does not work, so I'm trying to figure out how to provide vectors to the function. 我的下面的代码不起作用,所以我试图弄清楚如何为函数提供向量。

LOC_1_ID LOC1_LAT_CORD LOC1_LONG_CORD LOC_2_ID LOC2_LAT_CORD LOC2_LONG_CORD
 1       35.68440        -80.48090        70624    34.86752   -82.46632
 6       35.49770        -80.62870        70624    34.86752   -82.46632
10       35.66042        -80.50053        70624    34.86752   -82.46632

Assuming res holds the data.table the below code does not work. 假设res保存data.table,下面的代码不起作用。

 res[,DISTANCE := distm(c(LOC1_LAT_CORD, LOC1_LONG_CORD),c(LOC2_LAT_CORD, LOC2_LONG_CORD), fun=distHaversine)*0.000621371]

If I were to pull out each vector the function works fine. 如果我要拉出每个向量,该函数工作正常。

loc1 <- res[LOC1_ID == 1,.(LOC1_LAT_CORD, LOC1_LONG_CORD)]
loc2 <- res[LOC2_ID==70624,.(LOC2_LAT_CORD, LOC2_LONG_CORD)]
distm(loc1, loc2, fun=distHaversine)

Really, my question is how to apply functions to select columns within a data.table when that function requires vectors as parameters. 真的,我的问题是当该函数需要向量作为参数时,如何应用函数来选择data.table中的列。

The distm fucntion generates a Distance matrix of a set of points . distm fucntion生成一组点距离矩阵 Are you sure this is the function you want if you're just comparing the points on each row, and adding one column? 如果您只是比较每行上的点并添加一列,您确定这是您想要的功能吗?

It sounds like you actually want either distHaversine or distGeo 听起来你真的想要distHaversinedistGeo

library(data.table)
library(geosphere)

dt <- read.table(text = "LOC_1_ID LOC1_LAT_CORD LOC1_LONG_CORD LOC_2_ID LOC2_LAT_CORD LOC2_LONG_CORD
1       35.68440        -80.48090        70624    34.86752   -82.46632
6       35.49770        -80.62870        70624    34.86752   -82.46632
10       35.66042        -80.50053        70624    34.86752   -82.46632", header = T)

setDT(dt)
dt[, distance_hav := distHaversine(matrix(c(LOC1_LONG_CORD, LOC1_LAT_CORD), ncol = 2),
                                   matrix(c(LOC2_LONG_CORD, LOC2_LAT_CORD), ncol = 2))]

#     LOC_1_ID LOC1_LAT_CORD LOC1_LONG_CORD LOC_2_ID LOC2_LAT_CORD LOC2_LONG_CORD distance_hav
# 1:        1      35.68440      -80.48090    70624      34.86752      -82.46632     202046.3
# 2:        6      35.49770      -80.62870    70624      34.86752      -82.46632     181310.0
# 3:       10      35.66042      -80.50053    70624      34.86752      -82.46632     199282.1

Update: This answer gives a more efficient version of distHaversine for use in data.table 更新: 此答案提供了更高效的distHaversine版本, distHaversine用于data.table

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM