[英]Loop over a data.table rows with condition
I have a data.table that holds ids and locations. 我有一个包含ID和位置的data.table。 for example, here is it with one row in it: (it has col and row names, don't know if it matters) 例如,这里有一行:(它有col和行名,不知道是否重要)
locations<-data.table(c(11,12),c(-159.58,0.2),c(21.901,22.221))
colnames(locations)<-c("id","location_lon","location_lat")
rownames(locations)<-c("1","2")
I then want to iterate over the rows and compare them to another point (with lat,lon). 然后,我想遍历行并将它们与另一个点(用经纬度)进行比较。 In a for loop it works: 在for循环中,它可以工作:
for (i in 1:nrow(locations)) {
loc <- locations[i,]
dist <- gdist(-159.5801, 21.901, loc$location_lon, loc$location_lat, units="m")
if(dist <= 50) {
return (loc)
}
return (NULL)
}
and returns: 并返回:
id location_lon location_lat id location_lon location_lat
1: 11 -159.58 21.901 1:11 -159.58 21.901
but I want to use apply. 但我想使用套用。 The following code fails to run: 以下代码无法运行:
dists <- apply(locations,1,function(x) if (50 - gdist(-159.5801, 21.901, x$location_lon, x$location_lat, units="m")>=0) x else NULL)
with $ operator is invalid for atomic vectors
error. 使用$ operator is invalid for atomic vectors
错误$ operator is invalid for atomic vectors
。 Changing to reference by location ( x[2],x[3]
) isn't enough to fix this, I get 按位置更改为引用( x[2],x[3]
)不足以解决此问题,我得到
Error in if (radius - gdist(lon, lat, x[2], x[3], units = "m") >= 0) x else NULL :
missing value where TRUE/FALSE needed
This is because the data.table is converted to matrix, and the coordinates are treated as text instead of numbers. 这是因为data.table转换为矩阵,并且坐标被视为文本而不是数字。 Is there a way to overcome this? 有办法克服吗? The solution needs to be efficient (I want to run this check for >1,000,000 different coordinates). 解决方案需要高效(我要针对> 1,000,000个不同的坐标运行此检查)。 Changing the data structure of the locations table is possible if needed. 如果需要,可以更改位置表的数据结构。
No loops are required, just use data.table
as intended. 不需要循环,只需按预期使用data.table
。 If all you want to see are the rows that within 50 meters from the desired location, all you have to do is 如果您要查看的只是距所需位置50米以内的行,那么您要做的就是
locations[, if (gdist(-159.58, 21.901, location_lon, location_lat, units="m") <= 50) .SD, id]
## id location_lon location_lat
## 1: 11 -159.58 21.901
Here we are iterating by the id
column within the locations
data set itself and checking if each id
is within 50 meters from -159.58, 21.901
. 在这里,我们通过locations
数据集本身中的id
列进行迭代,并检查每个id
是否在-159.58, 21.901
50米范围内。 If so, we are calling .SD
which is basically the data set itself for that specific id
. 如果是这样,我们正在调用.SD
,这基本上是该特定id
的数据集本身。
As a side note, data.table
doesn't have row.names
, so there is no need of specifiying them, see here , for example 附带说明一下, data.table
没有row.names
,因此无需指定它们,例如,请参见此处
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.