[英]calculating geographic distance between two rows in data.table
我的問題與此基本相同: 計算 data.table 中兩行之間的距離,但我正在尋找使用 data.table 語法而不是 for 循環的答案。
我有一個像這樣的 data.table:
Lat Lon Time Bus
52.21808 20.96675 2018-04-20 21:27:26 3
52.25882 20.89850 2018-04-20 21:27:23 8
52.24347 21.08460 2018-04-20 21:27:27 1
52.21935 20.97186 2018-04-20 21:28:31 3
52.25808 20.89790 2018-04-20 21:28:32 8
52.24541 21.08522 2018-04-20 21:28:36 1
我想計算兩個連續點之間的距離,按總線分組,使用例如來自 geosphere 包的 distGeo。 所以像:
d[,distance:=distGeo(c(Lon, Lat), ???????),by=Bus]
編輯我得到一些有用的結果
d[,distance:=distGeo(cbind(Lon, Lat)),by=Bus]
但不完全正確:有一個警告,每組一件物品需要回收。 有沒有辦法在每條總線的第一行或最后一行獲得 NA?
編輯 2 看起來我有。
d[,distance:=c(distGeo(cbind(Lon, Lat)),NA) ,by=Bus]
通過將 Lat/Lon 行上移一位來創建兩個新列:
setorder(dt, Bus)
dt[, `:=`(Lat_to = shift(Lat, type = "lead"),
Lon_to = shift(Lon, type = "lead")),
by = Bus]
使用我為這個答案編寫的這個函數(它是一個更有效的 data.table-style 半正弦計算)
dtHaversine <- function(lat_from, lon_from, lat_to, lon_to, r = 6378137){
radians <- pi/180
lat_to <- lat_to * radians
lat_from <- lat_from * radians
lon_to <- lon_to * radians
lon_from <- lon_from * radians
dLat <- (lat_to - lat_from)
dLon <- (lon_to - lon_from)
a <- (sin(dLat/2)^2) + (cos(lat_from) * cos(lat_to)) * (sin(dLon/2)^2)
return(2 * atan2(sqrt(a), sqrt(1 - a)) * r)
}
應用它
dt[, dist := dtHaversine(Lat, Lon, Lat_to, Lon_to)]
dt
# Lat Lon Date Time Bus Lat_to Lon_to dist
# 1: 52.24347 21.08460 2018-04-20 21:27:27 1 52.24541 21.08522 220.05566
# 2: 52.24541 21.08522 2018-04-20 21:28:36 1 NA NA NA
# 3: 52.21808 20.96675 2018-04-20 21:27:26 3 52.21935 20.97186 376.08498
# 4: 52.21935 20.97186 2018-04-20 21:28:31 3 NA NA NA
# 5: 52.25882 20.89850 2018-04-20 21:27:23 8 52.25808 20.89790 91.96366
# 6: 52.25808 20.89790 2018-04-20 21:28:32 8 NA NA NA
library(data.table)
dt <- fread(
'Lat Lon Date Time Bus
52.21808 20.96675 2018-04-20 21:27:26 3
52.25882 20.89850 2018-04-20 21:27:23 8
52.24347 21.08460 2018-04-20 21:27:27 1
52.21935 20.97186 2018-04-20 21:28:31 3
52.25808 20.89790 2018-04-20 21:28:32 8
52.24541 21.08522 2018-04-20 21:28:36 1')
100 萬行的示例
set.seed(123)
dt <- data.table(Lat = sample(-90:90, 1e6, replace = T),
Lon = sample(-90:90, 1e6, replace = T),
Bus = rep(1:5e5,2))
setorder(dt, Bus)
system.time({
dt[, `:=`(Lat_to = shift(Lat, type = "lead"),
Lon_to = shift(Lon, type = "lead")),
by = Bus]
dt[, dist := dtHaversine(Lat, Lon, Lat_to, Lon_to)]
})
# user system elapsed
# 7.985 0.033 8.020
這是使用包gmt
的解決方案:
require(data.table)
require(gmt)
set.seed(123)
some_latlon <- data.table(id = sample(x = 1:2, size = 10, replace = TRUE),
xfrom = runif(n = 10, min = 3, max = 6),
yfrom = runif(n = 10, min = 52, max = 54))
setkey(some_latlon, id)
some_latlon[, xto := c(xfrom[-1], NA), by = id]
some_latlon[, yto := c(yfrom[-1], NA), by = id]
some_latlon[, dist := geodist(Nfrom = yfrom, Efrom = xfrom,
Nto = yto, Eto = xto, units = "km"), by = id]
當然,您可以輕松刪除 cols xto
和yto
。 HTH
geodist::geodist
將工作太,這是比快geosphere::distHaversine
。
require(data.table)
require(microbenchmark)
d =
fread(
'
Lat,Lon,Time,Bus
52.21808,20.96675,2018-04-20 21:27:26,3
52.25882,20.89850,2018-04-20 21:27:23,8
52.24347,21.08460,2018-04-20 21:27:27,1
52.21935,20.97186,2018-04-20 21:28:31,3
52.25808,20.89790,2018-04-20 21:28:32,8
52.24541,21.08522,2018-04-20 21:28:36,1
')
setorder(d, Bus, Time)
microbenchmark(
d[, dist_geodist := geodist::geodist(cbind(Lat, Lon),
measure='haversine', sequential = TRUE) , by = Bus]
,
d[,dist_geosphere := geosphere::distHaversine(cbind(Lon, Lat) ) , by=Bus]
)
Unit: microseconds
expr min
d[, `:=`(dist_geodist, geodist::geodist(cbind(Lat, Lon), measure = "haversine", sequential = TRUE)), by = Bus] 861.937
d[, `:=`(dist_geosphere, geosphere::distHaversine(cbind(Lon, Lat))), by = Bus] 1005.890
lq mean median uq max neval cld
868.7585 910.8999 875.4555 920.138 1463.567 100 a
1016.2335 1065.2952 1028.3775 1070.428 1738.151 100 b
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.