简体   繁体   English

带条件循环遍历data.table行

[英]Loop over a data.table rows with condition

I have a data.table that holds ids and locations. 我有一个包含ID和位置的data.table。 for example, here is it with one row in it: (it has col and row names, don't know if it matters) 例如,这里有一行:(它有col和行名,不知道是否重要)

locations<-data.table(c(11,12),c(-159.58,0.2),c(21.901,22.221))
colnames(locations)<-c("id","location_lon","location_lat")
rownames(locations)<-c("1","2")

I then want to iterate over the rows and compare them to another point (with lat,lon). 然后,我想遍历行并将它们与另一个点(用经纬度)进行比较。 In a for loop it works: 在for循环中,它可以工作:

for (i in 1:nrow(locations)) {
    loc <- locations[i,]
    dist <- gdist(-159.5801, 21.901, loc$location_lon, loc$location_lat, units="m")
    if(dist <= 50) {
        return (loc)
    }
    return (NULL)
}

and returns: 并返回:

id location_lon location_lat id location_lon location_lat

1: 11 -159.58 21.901 1:11 -159.58 21.901

but I want to use apply. 但我想使用套用。 The following code fails to run: 以下代码无法运行:

dists <- apply(locations,1,function(x) if (50 - gdist(-159.5801, 21.901, x$location_lon, x$location_lat, units="m")>=0) x else NULL)

with $ operator is invalid for atomic vectors error. 使用$ operator is invalid for atomic vectors错误$ operator is invalid for atomic vectors Changing to reference by location ( x[2],x[3] ) isn't enough to fix this, I get 按位置更改为引用( x[2],x[3] )不足以解决此问题,我得到

Error in if (radius - gdist(lon, lat, x[2], x[3], units = "m") >= 0) x else NULL : 
missing value where TRUE/FALSE needed 

This is because the data.table is converted to matrix, and the coordinates are treated as text instead of numbers. 这是因为data.table转换为矩阵,并且坐标被视为文本而不是数字。 Is there a way to overcome this? 有办法克服吗? The solution needs to be efficient (I want to run this check for >1,000,000 different coordinates). 解决方案需要高效(我要针对> 1,000,000个不同的坐标运行此检查)。 Changing the data structure of the locations table is possible if needed. 如果需要,可以更改位置表的数据结构。

No loops are required, just use data.table as intended. 不需要循环,只需按预期使用data.table If all you want to see are the rows that within 50 meters from the desired location, all you have to do is 如果您要查看的只是距所需位置50米以内的行,那么您要做的就是

locations[, if (gdist(-159.58, 21.901, location_lon, location_lat, units="m") <= 50) .SD, id]
##    id location_lon location_lat
## 1: 11      -159.58       21.901

Here we are iterating by the id column within the locations data set itself and checking if each id is within 50 meters from -159.58, 21.901 . 在这里,我们通过locations数据集本身中的id列进行迭代,并检查每个id是否在-159.58, 21.901 50米范围内。 If so, we are calling .SD which is basically the data set itself for that specific id . 如果是这样,我们正在调用.SD ,这基本上是该特定id的数据集本身。


As a side note, data.table doesn't have row.names , so there is no need of specifiying them, see here , for example 附带说明一下, data.table没有row.names ,因此无需指定它们,例如,请参见此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM